Coding Models Are Doing Too Much

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

AI coding tools like Claude Code, GitHub Co-pilot, and GPT-5 tend to over-edit code — rewriting entire functions when only a minimal fix was needed. This 'Over-editing' problem is invisible to test suites but burdens code review. A benchmark of 400 programmatically corrupted Python problems reveals that all frontier models over-edit to varying degrees, with GPT-5.4 being the worst offender and Claude Op 4.6 the most conservative. Adding an explicit prompt to preserve original code significantly reduces over-editing, especially for reasoning models. Training experiments show that reinforcement learning (vs. SFT or DPO) is the only method that generalizes cleanly to unseen corruptions without catastrophic forgetting, and the gains scale to 14B parameter models. Using a small-rank (64) Lo-Ra adapter achieves near-full-fine-tuning performance at lower cost.

18m read timeFrom nrehiew.github.io
Post cover image
Table of contents
Over-EditingMeasuring Over-EditingDo Models Over-Edit?Does Prompting Help?Does Reasoning Mean Overthinking and Over-Editing?TrainingFinal Thoughts
1 Comment

Sort: