Kthena is a new open-source sub-project of Volcano designed for LLM inference orchestration on Kubernetes. It addresses production challenges like low GPU/NPU utilization, latency-throughput tradeoffs, and multi-model management through intelligent routing, KV Cache-aware scheduling, and Prefill-Decode disaggregation. The
Table of contents
The “Last Mile” Challenge of LLM ServingKthena: The Intelligent Brain for Cloud Native InferenceCore Features and AdvantagesPerformance BenchmarksCommunity & Industry SupportStart Exploring Kthena TodaySort: