Are LLMs not getting better?

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

An analysis of METR's research on LLM coding performance challenges the narrative of continuous improvement. By comparing merge rates (whether code would actually be accepted by maintainers) rather than test-passing rates, the data shows no meaningful improvement in LLM programming ability since early 2025. Using leave-one-out cross-validation and Brier scores, a constant function fits the merge rate data better than a linear upward trend, suggesting LLMs have plateaued in real-world coding quality for over a year despite ongoing hype about capability gains.

3m read timeFrom entropicthoughts.com
Post cover image

Sort: