A research paper introduces Simple Self-Distillation (SSD), a technique that improves LLM code generation without requiring a verifier, teacher model, or reinforcement learning. By sampling solutions at specific temperature and truncation settings and fine-tuning on those samples via standard supervised fine-tuning, SSD boosts

2m read timeFrom arxiv.org
Post cover image

Sort: