vLLM Semantic Router v0.1 (Iris) introduces a production-ready intelligent routing platform for LLM systems. The release features a Signal-Decision Plugin Chain Architecture that extracts six signal types (domain, keyword, embedding, factual, feedback, preference) to make routing decisions. Key improvements include modular LoRA architecture for performance optimization, HaluGate hallucination detection pipeline, one-command installation, ecosystem integration with inference frameworks and API gateways, specialized MoM model family, OpenAI Responses API support, and intelligent tool selection. The platform enables model selection, safety filtering, semantic caching, and hallucination detection between users and models. Future roadmap includes enhanced signal types, ML-based model selection algorithms, additional plugins, multi-turn conversation improvements, and safety enhancements.

9m read timeFrom blog.vllm.ai
Post cover image
Table of contents
Why Iris?What’s New in v0.1 Iris?Looking Ahead: v0.2 RoadmapAcknowledgmentsGet StartedJoin the Community

Sort: