I'm Building Agents That Run While I Sleep

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

When AI agents write code autonomously overnight, verifying correctness becomes the core challenge. The author argues that having the same AI write both code and tests creates a self-congratulation loop that misses original misunderstandings. The solution draws from TDD: write acceptance criteria before prompting the agent, then run automated verification against those criteria. A practical implementation using Claude Code's headless mode and Playwright MCP is described, with four stages: pre-flight checks, an Opus planner that reads specs and changed files, parallel Sonnet browser agents per acceptance criterion, and a final Opus judge that returns pass/fail verdicts. The workflow shifts review effort from reading diffs to investigating failures, and the author open-sourced the tool as a Claude Code plugin.

#ai-agents

#claude-code

#tdd

Mar 10•6m read time•From claudecodecamp.com

Table of contents

The obvious answers don't work The thing TDD got right What this looks like in practice How to build it

Comment

Bookmark

Copy

Sort: