Kthena is a new open-source sub-project of Volcano designed for LLM inference orchestration on Kubernetes. It addresses production challenges like low GPU/NPU utilization, latency-throughput tradeoffs, and multi-model management through intelligent routing, KV Cache-aware scheduling, and Prefill-Decode disaggregation. The

8m read timeFrom cncf.io
Post cover image
Table of contents
The “Last Mile” Challenge of LLM ServingKthena: The Intelligent Brain for Cloud Native InferenceCore Features and AdvantagesPerformance BenchmarksCommunity & Industry SupportStart Exploring Kthena Today

Sort: