Anthropic has published a multi-agent harness design for long-running autonomous software development, splitting work across three specialized agents: planner, generator, and evaluator. The architecture addresses common failure modes in extended AI coding sessions, such as context loss and agent self-overrating. A separate evaluator agent uses few-shot calibration and tools like Playwright MCP to critique outputs across criteria including design quality, originality, craft, and functionality. Iterative cycles of 5–15 runs can span up to four hours, progressively refining results. Human oversight remains important for initial calibration, while the framework supports both parallel and sequential agent execution depending on task dependencies.

3m read timeFrom infoq.com
Post cover image

Sort: