Coding Models Are Doing Too Much

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

AI coding tools like Claude Code, GitHub Co-pilot, and GPT-5 tend to over-edit code — rewriting entire functions when only a minimal fix was needed. This 'Over-editing' problem is invisible to test suites but burdens code review. A benchmark of 400 programmatically corrupted Python problems reveals that all frontier models over-edit to varying degrees, with GPT-5.4 being the worst offender and Claude Op 4.6 the most conservative. Adding an explicit prompt to preserve original code significantly reduces over-editing, especially for reasoning models. Training experiments show that reinforcement learning (vs. SFT or DPO) is the only method that generalizes cleanly to unseen corruptions without catastrophic forgetting, and the gains scale to 14B parameter models. Using a small-rank (64) Lo-Ra adapter achieves near-full-fine-tuning performance at lower cost.

#llm

#deep-learning

#reinforcement-learning

#ai-coding

Apr 22•18m read time•From nrehiew.github.io

Table of contents

Over-Editing Measuring Over-Editing Do Models Over-Edit?Does Prompting Help?Does Reasoning Mean Overthinking and Over-Editing?Training Final Thoughts