A new way to fine-tune LLMs just dropped
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
Evolution strategies, long considered unscalable for deep neural networks, are making a comeback in LLM fine-tuning. Two key papers are driving this revival: 'Evolution Strategies at Scale' (Sept 2025), which showed ES can fine-tune billion-parameter models using a population of just 30 models by exploiting the low intrinsic dimensionality of useful update directions; and 'EgRoL' (Nov 2025), which structures perturbations as LoRA updates to dramatically reduce compute costs. EgRoL enables massively parallel inference-only training without backpropagation, outperforming GRPO on benchmarks like Countdown (35% vs 23% accuracy) and GSM8K while running up to 32x more parallel generations under the same hardware. The key insight is that ES fits naturally into RL-style fine-tuning where only a coarse outcome-level reward is available, avoiding the sparse credit assignment problem that plagues token-level RL methods like GRPO.
•15m watch time
3 Comments
Sort: