When AI applications call model provider APIs directly, switching models requires code changes and creates tight coupling. An LLM gateway solves this by acting as a unified routing layer between applications and multiple providers. Using Red Hat OpenShift AI 3.4's Models-as-a-Service (MaaS) capability alongside LiteLLM Proxy, teams can expose a single OpenAI-compatible endpoint that routes requests to OpenAI, Gemini, or self-hosted Llama models running via vLLM on OpenShift. The tutorial covers deploying LiteLLM as a Kubernetes deployment with a ConfigMap-based model list, deploying a Llama-3.1-8B model using KServe InferenceService and vLLM ServingRuntime, and testing all three backends through the same API endpoint with simple curl commands.

Table of contents
The challenge of switching between model providersModel portabilityDeploy an LLM gateway on OpenShift AIDeploy the self-hosted model on OpenShift AITest the setup in-clusterConclusionSort: