NVIDIA Dynamo 1.0, announced at GTC, is now available to DigitalOcean customers. It delivers up to 7x inference performance improvement on NVIDIA GB200 NVL systems through key features: KV-aware routing, disaggregated prefill/decode serving, and memory offloading via a KV Block Manager. Paired with DigitalOcean's Agentic Inference Cloud and Managed Kubernetes, Workato achieved 67% higher GPU throughput, 79% lower latency, and 67% lower model cost using half the GPUs. Customers can deploy Dynamo 1.0 as a container on Droplets or via DigitalOcean Kubernetes with inference runtimes like vLLM, SGLang, or TensorRT-LLM.
Table of contents
What is NVIDIA Dynamo 1.0?How DigitalOcean optimizes inference workloads with Dynamo to improve throughput and latencyThe future of inference optimization with NVIDIA and DigitalOceanSort: