vLLM Semantic Router (vLLM-SR) v0.1 introduces Mixture-of-Models (MoM) architecture for intelligent routing across multiple specialized LLMs. Unlike Mixture-of-Experts (MoE) which routes at the token level within a single model, MoM orchestrates independent models at the request level using configurable signals. The live demo
Table of contents
Why System Intelligence for LLMs?Table of ContentsMixture-of-Models vs Mixture-of-ExpertsThe MoM Design PhilosophyLive Demo on AMD GPUsSignal-Based RoutingHow to run it on AMD GPU (MI300X/MI355X)What’s NextResourcesAcknowledgementsJoin UsSort: