NVIDIA Dynamo 1.0, announced at GTC, is now available to DigitalOcean customers. It delivers up to 7x inference performance improvement on NVIDIA GB200 NVL systems through key features: KV-aware routing, disaggregated prefill/decode serving, and memory offloading via a KV Block Manager. Paired with DigitalOcean's Agentic Inference Cloud and Managed Kubernetes, Workato achieved 67% higher GPU throughput, 79% lower latency, and 67% lower model cost using half the GPUs. Customers can deploy Dynamo 1.0 as a container on Droplets or via DigitalOcean Kubernetes with inference runtimes like vLLM, SGLang, or TensorRT-LLM.

4m read timeFrom digitalocean.com
Post cover image
Table of contents
What is NVIDIA Dynamo 1.0?How DigitalOcean optimizes inference workloads with Dynamo to improve throughput and latencyThe future of inference optimization with NVIDIA and DigitalOcean

Sort: