Karpathy's autoresearch project lets a coding agent autonomously improve a neural network training script by running experiments in a loop. This post scales that setup by giving Claude Code access to 16 GPUs (H100s and H200s) on a Kubernetes cluster via SkyPilot. Over 8 hours, the agent ran ~910 experiments in parallel waves of 10-13, achieving a 9x throughput increase over single-GPU sequential search. Key findings: parallelism enabled factorial grid search instead of greedy hill-climbing, allowing the agent to discover that scaling model width (aspect ratio 96) outperformed all hyperparameter tuning combined. The agent also autonomously developed a two-tier hardware strategy — screening ideas on cheaper H100s and validating winners on H200s — without being prompted. Total cost was under $300 in GPU compute plus ~$9 in Claude API fees. The full setup is available as an open-source example in the SkyPilot repo.

12m read timeFrom blog.skypilot.co
Post cover image
Table of contents
How autoresearch works #The bottleneck: one GPU, one experiment #Giving the agent cloud GPUs #Results: ~910 experiments, ~8 hours, 16 GPUs #How parallelism changed the agent’s research strategy #Emergent research strategies: exploiting heterogeneous hardware #Cost #Scale Autoresearch on your own GPU cluster #
2 Comments

Sort: