Researchers introduce TinyLoRA, a method that extends low-rank adaptation (LoRA) to scales far below the model dimension — down to a single parameter. Using this approach, they fine-tune Qwen2.5 8B to 91% accuracy on GSM8K with only 13 trained parameters (26 bytes in bf16). Across harder benchmarks like AIME, AMC, and MATH500, TinyLoRA recovers 90% of performance gains while training 1000x fewer parameters than standard methods. A key finding is that this extreme parameter efficiency only works with reinforcement learning; supervised fine-tuning (SFT) requires 100–1000x more parameter updates to reach equivalent performance.
Sort: