Sonnet 4.6 (Fully Tested): This MODEL is SO INTERESTING...
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
Claude Sonnet 4.6 shows a split personality in benchmarks: it regresses on one-shot tasks (dropping from 62% to 59% on KingBench, with general knowledge falling from 40% to 25%), yet dominates the agentic coding leaderboard with an 87.9 average score—beating even Opus 4.6. The model costs nearly double to run on benchmarks
•15m watch time
1 Comment
Sort: