The Volcano community is proud to announce the launch of Kthena, a new sub-project designed for global developers and MLOps engineers. Kthena is a cloud native…

CNCF's platform is a leading organization driving cloud-native technologies and standards, offering insights into container orchestration, microservices architecture, and cloud-native infrastructure. Through whitepapers, case studies, and community events, CNCF provides insights into adopting cloud-native practices and technologies. Developers and DevOps teams can learn about Kubernetes, Prometheus, and other CNCF projects to build and operate scalable and resilient cloud-native applications.

CNCF

Kthena is a new open-source sub-project of Volcano designed for LLM inference orchestration on Kubernetes. It addresses production challenges like low GPU/NPU utilization, latency-throughput tradeoffs, and multi-model management through intelligent routing, KV Cache-aware scheduling, and Prefill-Decode disaggregation. The system includes a high-performance router and controller manager that support topology-aware scheduling, gang scheduling, autoscaling, and multiple inference engines (vLLM, SGLang, Triton). Benchmarks show 2.73x throughput improvement and 73.5% TTFT reduction compared to random routing. Backed by Huawei Cloud, China Telecom, DaoCloud, and other industry partners.

Introducing Kthena: LLM inference for the cloud native era

The “Last Mile” Challenge of LLM Serving

Kthena: The Intelligent Brain for Cloud Native Inference