GEPA is a prompt optimization method that outperforms GRPO on compound AI systems by using 35× fewer rollouts and no GPU training. Instead of reducing rollout traces to a scalar reward like GRPO does, GEPA feeds full traces to a reflection LLM that rewrites prompts based on observed failure patterns. The method uses Pareto selection to preserve diverse prompt candidates rather than always mutating from the top performer. A concrete HotpotQA example shows a prompt jumping from 38% to 69% accuracy through one reflection cycle. The post also covers when to use GEPA vs GRPO vs MIPROv2 vs TextGrad, and notes that smaller training sets (20–100 examples) often outperform larger ones with GEPA. A secondary section explains why weaker teacher models can produce better fine-tuning data for smaller student models due to capacity mismatch.

14m read timeFrom blog.dailydoseofds.com
Post cover image
Table of contents
A tricky LLM interview question for AI EngineersHow to beat GRPO without touching model weights

Sort: