A step-by-step guide to deploying multiple LLMs behind a single OpenAI-compatible endpoint on OpenShift using a Model-as-a-Service (MaaS) pattern. The architecture combines llm-d, Gateway API Inference Extension (GAIE), and agentgateway to route inference requests based on the model field in the request body. The guide covers

Table of contents
The componentsUnderstanding the LLM routing traffic flowBefore you beginDeploying the stackVerify the deploymentAdd a new modelWhat's nextAlternative gateway providersLearn moreSort: