Anthropic released Claude Opus 4.6, introducing two key architectural features for long-running agentic workflows: Adaptive Thinking (four granular reasoning effort levels: low, medium, high, max) and Context Compaction (automatic summarization of earlier context to prevent 'context rot'). The model supports a 1M token context window in beta with 76% multi-needle retrieval accuracy at that scale, and doubles max output to 128K tokens. Thinking tokens are billed at $25/million output tokens, with a long-context premium kicking in above 200K tokens. The release also includes Agent Teams in Claude Code (research preview) for parallel multi-agent coordination, and Claude integration into PowerPoint. Anthropic claims top benchmark scores on Terminal-Bench 2.0, Humanity's Last Exam, GDPval-AA, and BrowseComp. Independent testing by Quesma found only 49% binary backdoor detection accuracy, and Hacker News users reported regressions from Opus 4.5. The model is available on Azure Foundry, AWS Bedrock, Google Vertex AI, and GitHub Copilot.

4m read timeFrom infoq.com
Post cover image

Sort: