Andrej Karpathy released AutoResearch, a 630-line Python script that autonomously ran 50 ML experiments overnight on a single GPU without human input. The core design rests on three primitives: a single editable asset (the training script), a scalar metric (validation bits per byte), and a time-boxed evaluation cycle. A key insight is that a Markdown file called program.md serves as the human-agent interface, encoding search strategy, constraints, and stopping criteria in structured prose. This pattern generalizes beyond ML training to database query optimization, support ticket routing, and RAG pipeline tuning. The human role shifts from running experiments to writing experimental protocols, with the quality of the program.md document becoming the binding constraint on autonomous loop quality. Harrison Chase of LangChain has already adapted the pattern for agent optimization.

10m read timeFrom thenewstack.io
Post cover image
Table of contents
What is the Karpathy Loop?Markdown is the human-agent interfaceThe Pattern That Goes Beyond ML ExperimentsThe human role shifts to experimental designWhat’s next

Sort: