An AI agent autonomously ran 23 experiments in Datadog LLM Observability to improve a query optimization agent by 59 percent.

DataDog Blog offers insights, tutorials, and updates on monitoring, analytics, and observability solutions. Covering topics such as infrastructure monitoring, log management, and application performance monitoring, DataDog Blog provides resources for developers, DevOps engineers, and IT professionals. Developers can learn about best practices, troubleshooting techniques, and optimization strategies for managing complex systems through DataDog's blog posts and guides.

Datadog

The Datadog Database Monitoring team used Karpathy's autoresearch tool to autonomously run 23 experiments overnight, improving a SQL query optimization AI agent's precision from 0.54 to 0.86 (a 59% gain). The process ran in three phases: optimizing prompts and tool chains on a large model (Claude Sonnet), compressing to a smaller model (Haiku) via knowledge distillation, and finally implementing a two-pass detector-verifier architecture to break through a performance ceiling. Datadog's LLM Observability Experiments platform served as the backbone for tracking hypotheses, per-case traces, and evaluation metrics across all runs, making the rapid iteration sustainable and reproducible.

How we made a SQL query optimization agent 59% more accurate using autoresearch and LLM Observability

Augmenting DBM’s query optimization recommender with agentic AI