Docker's Coding Agent Sandboxes team built a "Fleet" of seven AI agent roles that run autonomously in CI to test, triage, and fix code. Built on Claude Code skills (markdown role-description files), the Fleet includes a CLI tester with 52+ scenarios across 14 tiers, a project manager for deduplication and issue tracking, a product owner for daily release notes, performance and upgrade testers, and a software engineer that auto-fixes labeled issues and reduces tech debt weekly. The system uses a "Ralph-loop" pattern — a worker/reviewer iteration cycle — to generate and evaluate code changes, producing pull requests for human review. Key design principles: build skills as roles not scripts, develop locally before CI, compose skills like team members, and always separate generation from evaluation. The Fleet creates PRs but never merges them — merge authority stays with humans.
Table of contents
Local First, CI SecondThe RosterSkills That ComposeThe Ralph-Loop Is the EngineWhat the Fleet ShipsWhat We Don’t AutomateWhat We Learnt Building the FleetSort: