Andrej Karpathy’s 630-line Python script ran 50 experiments overnight without any human input

Andrej Karpathy released AutoResearch, a 630-line Python script that autonomously ran 50 ML experiments overnight on a single GPU without human input. The core design rests on three primitives: a single editable asset (the training script), a scalar metric (validation bits per byte), and a time-boxed evaluation cycle. A key insight is that a Markdown file called program.md serves as the human-agent interface, encoding search strategy, constraints, and stopping criteria in structured prose. This pattern generalizes beyond ML training to database query optimization, support ticket routing, and RAG pipeline tuning. The human role shifts from running experiments to writing experimental protocols, with the quality of the program.md document becoming the binding constraint on autonomous loop quality. Harrison Chase of LangChain has already adapted the pattern for agent optimization.

#ai-agents

#llm

#machine-learning

#python

Mar 14•10m read time•From thenewstack.io

Table of contents

What is the Karpathy Loop?Markdown is the human-agent interface The Pattern That Goes Beyond ML Experiments The human role shifts to experimental design What’s next

Comment

Bookmark

Copy

Sort: