Learn how Datadog uses LLM Observability internally to build and test a natural language querying agent for Cloud Cost Management, so that teams can easily ask questions in plain English to instantly get a Datadog query that connects metrics to costs.

DataDog Blog offers insights, tutorials, and updates on monitoring, analytics, and observability solutions. Covering topics such as infrastructure monitoring, log management, and application performance monitoring, DataDog Blog provides resources for developers, DevOps engineers, and IT professionals. Developers can learn about best practices, troubleshooting techniques, and optimization strategies for managing complex systems through DataDog's blog posts and guides.

Datadog

Datadog's Cloud Cost Management team built a natural language query agent that converts plain-English questions into metrics queries. They reduced debugging time from hours to minutes by creating a ground truth dataset from real user testing, implementing component-level evaluators (parsing, metric selection, roll-up, group-bys, filters) instead of binary pass/fail tests, and using LLM Observability with distributed tracing to inspect tool calls and intermediate outputs. Automated experiments run against the same dataset with every code change, surfacing regressions immediately and enabling objective model comparisons across accuracy, latency, and cost.

How we cut our NLQ agent debugging time from hours to minutes with LLM Observability

Building the ground truth dataset from user testing

Evaluating AI that doesn’t fail cleanly

Deconstructing correctness with evaluators

Automating scaled experimentation with every build

Debugging and iterating faster with evaluator-driven tracing

Build evaluation and tracing into your agent loop