Anthropic's Claude Sonnet 4.5 was marketed as an improvement over 4.0, but developers across forums like r/cursor and r/ClaudeAI report quality regressions in real coding workflows. Key complaints include increased API hallucinations, instruction-following failures, a resurgence of 'laziness' (incomplete outputs with placeholders), shallower reasoning, and heightened sycophancy. Benchmarks like SWE-bench show improvement, but these don't capture multi-file, iterative real-world development. The article examines the benchmark-vs-reality gap, Anthropic's opaque response, and the broader industry pattern of model quality oscillation tied to inference cost optimization. Practical guidance covers when to stick with Sonnet 4.0, when 4.5 with extended thinking earns its place, how to pin model versions via the Anthropic Python SDK, and why multi-model routing strategies reduce vendor lock-in risk.
Table of contents
Table of ContentsThe Upgrade That Sent Developers BackwardWhat Changed Between Claude Sonnet 4.0 and 4.5The Regression Complaints: What Developers Are Actually ReportingBenchmarks vs. Reality: Where the Numbers DivergeAnthropic's Response and the Broader Industry PatternPractical Recommendations: Which Model Should You Use Right NowWhat This Means for the Future of AI-Assisted DevelopmentSort: