How Kthena Router Supports Gateway API and Inference Extension

Kthena Router now supports the Kubernetes Gateway API and Gateway API Inference Extension for routing AI/ML inference workloads. The post explains why these APIs matter — resolving global modelName conflicts in multitenant environments, enabling industry-standard interoperability, and supporting standardized inference routing via InferencePool and InferenceObjective resources. Step-by-step configuration examples cover enabling Gateway API via Helm, creating Gateways on different ports to isolate ModelRoutes with the same modelName, and deploying InferencePool resources with HTTPRoute for the Inference Extension. The post also contrasts these standard APIs with Kthena's native ModelRoute/ModelServer CRDs, which offer advanced features like prefill-decode disaggregation and weighted routing for production workloads.

#kubernetes

#multi-tenancy

May 04•11m read time•From cloudnativenow.com

Table of contents

Gateway API and Gateway API Inference Extension: What Are They?Why Support Gateway API and Inference Extension?Enabling Gateway API Support Step 1: Deploy Mock Model Servers Step 2: Create a New Gateway Step 3: Create ModelRoutes Bound to Different Gateways Using Gateway API With Inference Extension Native ModelRoute/ModelServer: Advanced Features Prefill-Decode Disaggregation Weighted-Based Routing Conclusion Related

Comment

Bookmark

Copy

Sort: