In this video, I'll be telling you about Claude Sonnet 4.6, Anthropic's latest model, and why it's a complicated release. It actually performs worse than Sonnet 4.5 on one-shot tasks, but somehow beats Opus 4.6 on my agentic coding leaderboard.

--
Key Takeaways:

🧠 Claude Sonnet 4.6 scores lower than Sonnet 4.5 on KingBench, dropping from 62% to 59% overall.
📉 General knowledge took a massive hit, falling from 40% down to 25% compared to Sonnet 4.5.
💸 The cost nearly doubled on benchmarks, going from $0.43 to $0.80 per full run.
🚀 Sonnet 4.6 hits number 1 on my agent leaderboard with a score of 87.9, beating Opus 4.6.
🛠️ Tested on 5 real-world vibe coding projects including Go, React Native, Nuxt 3, SvelteKit, and Tauri.
💡 Anthropic appears to have deliberately optimized Sonnet 4.6 for agentic workflows over one-shot tasks.
👍 For vibe coding with an AI agent, Sonnet 4.6 is the best Sonnet model I have ever tested.

AICodeKing

Claude Sonnet 4.6 shows a split personality in benchmarks: it regresses on one-shot tasks (dropping from 62% to 59% on KingBench, with general knowledge falling from 40% to 25%), yet dominates the agentic coding leaderboard with an 87.9 average score—beating even Opus 4.6. The model costs nearly double to run on benchmarks despite same list pricing. Five real-world vibe coding projects (Go terminal app, React Native movie tracker, Nuxt 3 Q&A platform, Svelte Kanban board, Tauri desktop app) all completed successfully with zero errors. The theory is that Anthropic deliberately optimized Sonnet 4.6 for agentic workflows at the expense of raw one-shot intelligence, making it ideal for developers using coding agents but worse for casual chat use.

Sonnet 4.6 (Fully Tested): This MODEL is SO INTERESTING...

<p>Maybe it knows that it suposed to get tired when it’s late.  Try fixing clock.</p>