The Datadog Database Monitoring team used Karpathy's autoresearch tool to autonomously run 23 experiments overnight, improving a SQL query optimization AI agent's precision from 0.54 to 0.86 (a 59% gain). The process ran in three phases: optimizing prompts and tool chains on a large model (Claude Sonnet), compressing to a smaller model (Haiku) via knowledge distillation, and finally implementing a two-pass detector-verifier architecture to break through a performance ceiling. Datadog's LLM Observability Experiments platform served as the backbone for tracking hypotheses, per-case traces, and evaluation metrics across all runs, making the rapid iteration sustainable and reproducible.
Table of contents
Augmenting DBM’s query optimization recommender with agentic AIBuilding the experimentRunning the experimentTry it yourselfSort: