Kthena is a new open-source sub-project of Volcano designed for LLM inference orchestration on Kubernetes. It addresses production challenges like low GPU/NPU utilization, latency-throughput tradeoffs, and multi-model management through intelligent routing, KV Cache-aware scheduling, and Prefill-Decode disaggregation. The system includes a high-performance router and controller manager that support topology-aware scheduling, gang scheduling, autoscaling, and multiple inference engines (vLLM, SGLang, Triton). Benchmarks show 2.73x throughput improvement and 73.5% TTFT reduction compared to random routing. Backed by Huawei Cloud, China Telecom, DaoCloud, and other industry partners.

8m read timeFrom cncf.io
Post cover image
Table of contents
The “Last Mile” Challenge of LLM ServingKthena: The Intelligent Brain for Cloud Native InferenceCore Features and AdvantagesPerformance BenchmarksCommunity & Industry SupportStart Exploring Kthena Today

Sort: