A research paper introduces Simple Self-Distillation (SSD), a technique that improves LLM code generation without requiring a verifier, teacher model, or reinforcement learning. By sampling solutions at specific temperature and truncation settings and fine-tuning on those samples via standard supervised fine-tuning, SSD boosts Qwen3-30B-Instruct from 42.4% to 55.3% pass@1 on LiveCodeBench v6. The gains are most pronounced on harder problems and generalize across Qwen and Llama models at 4B, 8B, and 30B scales. The improvement is explained by a precision-exploration conflict in LLM decoding: SSD reshapes token distributions context-dependently, suppressing distractor tails where precision matters while preserving diversity where exploration helps.

2m read timeFrom arxiv.org
Post cover image

Sort: