A step-by-step guide to deploying multiple LLMs behind a single OpenAI-compatible endpoint on OpenShift using a Model-as-a-Service (MaaS) pattern. The architecture combines llm-d, Gateway API Inference Extension (GAIE), and agentgateway to route inference requests based on the model field in the request body. The guide covers

12m read timeFrom developers.redhat.com
Post cover image
Table of contents
The componentsUnderstanding the LLM routing traffic flowBefore you beginDeploying the stackVerify the deploymentAdd a new modelWhat's nextAlternative gateway providersLearn more

Sort: