Sonnet 4.6 (Fully Tested): This MODEL is SO INTERESTING...

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Claude Sonnet 4.6 shows a split personality in benchmarks: it regresses on one-shot tasks (dropping from 62% to 59% on KingBench, with general knowledge falling from 40% to 25%), yet dominates the agentic coding leaderboard with an 87.9 average score—beating even Opus 4.6. The model costs nearly double to run on benchmarks

15m watch time
1 Comment

Sort: