Anthropic introduces a three-agent harness separating planning, generation, and evaluation to improve long-running autonomous AI workflows for frontend and full-stack development. Industry commentary

InfoQ is a leading online platform for software developers, architects, and technical leaders, providing news, articles, presentations, and interviews on a wide range of topics, including agile practices, DevOps, microservices, and emerging technologies. With a focus on quality content and expert insights, InfoQ helps professionals stay informed about the latest trends, best practices, and industry developments. Developers can learn from real-world experiences, gain  knowledge, and connect with peers in the global software community through InfoQ's diverse and engaging content.

InfoQ

Anthropic has published a multi-agent harness design for long-running autonomous software development, splitting work across three specialized agents: planner, generator, and evaluator. The architecture addresses common failure modes in extended AI coding sessions, such as context loss and agent self-overrating. A separate evaluator agent uses few-shot calibration and tools like Playwright MCP to critique outputs across criteria including design quality, originality, craft, and functionality. Iterative cycles of 5–15 runs can span up to four hours, progressively refining results. Human oversight remains important for initial calibration, while the framework supports both parallel and sequential agent execution depending on task dependencies.

Anthropic’s Designs Three-Agent Harness Supports Long-Running Full-Stack AI Development