How DoorDash built an AI code reviewer engineers actually listen to

DoorDash built a custom AI code review agent that achieves a 60.2% acceptance rate on high and critical findings across 10,000+ weekly PRs. The system evolved through three versions, with the key innovation being a 'lead scout' that identifies suspicious areas before two deep reviewers investigate them — separating noticing from verifying. The architecture uses per-domain review profiles mined from historical PRs, Slack decisions, and incident history rather than generic AGENTS.md files. A precision-over-recall philosophy means the agent posts fewer but higher-quality comments, each anchored to specific lines with evidence. The system also includes a fixer agent that can apply suggested changes directly to PRs via remote VMs. Key engineering lessons include using per-agent soft/hard timeouts to handle stuck agents, measuring cost per successful review rather than token price, and building evals from real past incidents rather than synthetic puzzles.

#ai-agents

May 11•16m read time•From careersatdoordash.com

Table of contents

The numbers How we got here The design principle: precision over recall Focused context beats more context Why we built this ourselves Closing the loop from review to fix What it's actually good at Engineering lessons from production Evals are the development loop What's next

Comment

Bookmark

Copy

Sort: