Anthropic Just Killed Your AI Agent Harnesses
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
Anthropic ran experiments on their own agent harness, removing components one by one to measure what actually matters with Claude Opus 4.5 and 4.6. Their conclusion: most framework components like detailed task sharding, context resets, and sprint contracts are now dead weight given how capable newer models are. The only essential components are planning, generation, and evaluation agents. Planning should now be high-level (product-level, not implementation-level), since micro-detailed plans cause cascading errors. The generator and evaluator must remain separate agents to avoid self-evaluation bias, with graded scoring criteria for subjective outputs like UI quality. For those wanting a ready-made framework, GSD is recommended as a base, enhanced with Anthropic's scored evaluation mechanism. Frameworks like BMAD and SpecKit are now largely obsolete except for their PRD generation phase.
Sort: