Kthena Router now supports the Kubernetes Gateway API and Gateway API Inference Extension for routing AI/ML inference workloads. The post explains why these APIs matter — resolving global modelName conflicts in multitenant environments, enabling industry-standard interoperability, and supporting standardized inference routing via InferencePool and InferenceObjective resources. Step-by-step configuration examples cover enabling Gateway API via Helm, creating Gateways on different ports to isolate ModelRoutes with the same modelName, and deploying InferencePool resources with HTTPRoute for the Inference Extension. The post also contrasts these standard APIs with Kthena's native ModelRoute/ModelServer CRDs, which offer advanced features like prefill-decode disaggregation and weighted routing for production workloads.

11m read timeFrom cloudnativenow.com
Post cover image
Table of contents
Gateway API and Gateway API Inference Extension: What Are They?Why Support Gateway API and Inference Extension?Enabling Gateway API SupportStep 1: Deploy Mock Model ServersStep 2: Create a New GatewayStep 3: Create ModelRoutes Bound to Different GatewaysUsing Gateway API With Inference ExtensionNative ModelRoute/ModelServer: Advanced FeaturesPrefill-Decode DisaggregationWeighted-Based RoutingConclusionRelated

Sort: