Abstract page for arXiv paper 2602.04118: Learning to Reason in 13 Parameters

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Researchers introduce TinyLoRA, a method that extends low-rank adaptation (LoRA) to scales far below the model dimension — down to a single parameter. Using this approach, they fine-tune Qwen2.5 8B to 91% accuracy on GSM8K with only 13 trained parameters (26 bytes in bf16). Across harder benchmarks like AIME, AMC, and MATH500, TinyLoRA recovers 90% of performance gains while training 1000x fewer parameters than standard methods. A key finding is that this extreme parameter efficiency only works with reinforcement learning; supervised fine-tuning (SFT) requires 100–1000x more parameter updates to reach equivalent performance.

[2602.04118] Learning to Reason in 13 Parameters