To make agentic workloads production-ready, the Workato AI Research Lab needed an inference stack built for production scale. This technical deep dive explores how DigitalOcean's Agentic Inference Cloud delivered 67% lower inference costs powered by NVIDIA GPUs.

DO (DigitalOcean) provides insights into cloud computing, infrastructure as code, and developer tools, offering tutorials and documentation for deploying and managing applications on the cloud. By exploring DO's curated content, developers can learn about cloud-native architectures, Kubernetes deployment patterns, and best practices for building scalable and resilient applications. Whether you're a startup founder, indie developer, or enterprise IT professional, DO offers resources to accelerate your cloud journey and optimize your infrastructure for success.

DigitalOcean

Workato's AI Research Lab partnered with DigitalOcean to optimize LLM inference for agentic workloads at scale. By deploying NVIDIA Dynamo with vLLM on DigitalOcean Kubernetes Service (DOKS) using NVIDIA H200 GPUs, the team achieved 67% higher throughput per GPU, 79% lower end-to-end latency, and 77% lower time-to-first-token compared to standard vLLM configurations. The core innovation was KV-aware routing: instead of blindly load-balancing requests across workers, NVIDIA Dynamo tracks global KV cache state and routes requests to workers that already hold the relevant cached prefixes. This eliminates redundant prefill computation — a major cost driver for long-context, high-concurrency workloads. The result was 67% lower model cost using half the GPUs, achieved through architectural coordination of routing, cache management, GPU topology, and Kubernetes orchestration rather than simply adding more hardware.

How DigitalOcean’s Agentic Inference Cloud powered by NVIDIA GPUs Achieved 67% Lower Inference Costs for Workato

How LLMs Process Requests and Why It Gets Expensive at Scale

How KV-Aware Routing Addresses the Problem

NVIDIA Dynamo with DOKS: The Orchestration Brain for KV-Aware Routing