How We Built DigitalOcean Inference Router

DigitalOcean's Inference Router automatically routes LLM requests to the best-fit model based on task type, cost, and latency—eliminating the need for hardcoded routing logic in application code. Built on Plano, an open-source AI-native proxy, it uses purpose-built small language models (Arch-Router 1.5B and Plano-Orchestrator up to 30B MoE) to classify intent from conversation context in ~200ms. The ranking engine uses live cost and latency data from DigitalOcean's pricing API and Prometheus to order candidate models dynamically. The architecture layers Envoy for connection handling, a Rust-based WASM filter for provider format translation, and a native Rust binary (Brightstaff) for routing logic. Key lessons include: purpose-built routing models outperform frontier models on narrow tasks, task description quality is critical for routing accuracy, and provider latency varies 2-3x throughout the day requiring live metrics. Available as a managed service on DigitalOcean or self-hosted via the open-source Plano project.

#rust

#digitalocean

#agentic-ai

May 20•16m read time•From digitalocean.com

Table of contents

DigitalOcean’s Inference Router How It Works: Plano Under the Hood The Routing Model The Ranking Engine: Live Cost and Latency Data Under the Hood: Envoy, WASM, and Async Rust Getting Started What We Learned What We’re Exploring Next