Kthena is a new open-source sub-project of Volcano designed for LLM inference orchestration on Kubernetes. It addresses production challenges like low GPU/NPU utilization, latency-throughput tradeoffs, and multi-model management through intelligent routing, KV Cache-aware scheduling, and Prefill-Decode disaggregation. The system includes a high-performance router and controller manager that support topology-aware scheduling, gang scheduling, autoscaling, and multiple inference engines (vLLM, SGLang, Triton). Benchmarks show 2.73x throughput improvement and 73.5% TTFT reduction compared to random routing. Backed by Huawei Cloud, China Telecom, DaoCloud, and other industry partners.
Table of contents
The “Last Mile” Challenge of LLM ServingKthena: The Intelligent Brain for Cloud Native InferenceCore Features and AdvantagesPerformance BenchmarksCommunity & Industry SupportStart Exploring Kthena TodaySort: