vLLM Semantic Router is the System Level Intelligence for Mixture-of-Models (MoM), bringing Collective Intelligence into LLM systems. It lives between users and models, capturing signals from requests, responses, and context to make intelligent routing decisions—including model selection, safety filtering (jailbreak, PII), semantic caching, and hallucination detection. For more background, see our initial announcement blog post.

vLLM

vLLM Semantic Router v0.1 (Iris) introduces a production-ready intelligent routing platform for LLM systems. The release features a Signal-Decision Plugin Chain Architecture that extracts six signal types (domain, keyword, embedding, factual, feedback, preference) to make routing decisions. Key improvements include modular LoRA architecture for performance optimization, HaluGate hallucination detection pipeline, one-command installation, ecosystem integration with inference frameworks and API gateways, specialized MoM model family, OpenAI Responses API support, and intelligent tool selection. The platform enables model selection, safety filtering, semantic caching, and hallucination detection between users and models. Future roadmap includes enhanced signal types, ML-based model selection algorithms, additional plugins, multi-turn conversation improvements, and safety enhancements.

vLLM Semantic Router v0.1 Iris: The First Major Release