DigitalOcean's engineering team shares how they built and shipped a production AI documentation assistant on their Gradient AI Platform. The post covers the full journey: architecture (embedded JS snippet + internal proxy service), evaluation methodology using golden datasets with LLM-as-a-judge metrics (correctness and ground truth adherence), and a CI/CD pipeline that gates deployments on metric thresholds. Key lessons include using Terraform for agent provisioning, building golden datasets before prompt iteration, tuning retrieval parameters (k=10 to k=5), adding keyword-to-product mappings to reduce ambiguity, and running automated evaluations on every PR. The team set release bars of 80% ground truth adherence and 95% correctness, and iterated through prompt changes, retrieval method switches (rewrite to sub-queries), and dataset cleanup to reach those numbers.

17m read timeFrom digitalocean.com
Post cover image
Table of contents
ArchitectureInference InfrastructureData Driven Approach to ValidationAgent Configuration DecisionsTop 3 Must-Dos When Creating an AI Agent for ProductionBuild and scale AI applications on DigitalOcean

Sort: