I'm Building Agents That Run While I Sleep
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
When AI agents write code autonomously overnight, verifying correctness becomes the core challenge. The author argues that having the same AI write both code and tests creates a self-congratulation loop that misses original misunderstandings. The solution draws from TDD: write acceptance criteria before prompting the agent, then run automated verification against those criteria. A practical implementation using Claude Code's headless mode and Playwright MCP is described, with four stages: pre-flight checks, an Opus planner that reads specs and changed files, parallel Sonnet browser agents per acceptance criterion, and a final Opus judge that returns pass/fail verdicts. The workflow shifts review effort from reading diffs to investigating failures, and the author open-sourced the tool as a Claude Code plugin.
Table of contents
The obvious answers don't workThe thing TDD got rightWhat this looks like in practiceHow to build itSort: