vllm-project/semantic-router: Intelligent Mixture-of-Models Router for Efficient LLM Inference
A Mixture-of-Models router that intelligently routes OpenAI API requests to the most suitable models based on semantic understanding of request intent. Uses BERT classification to analyze complexity, task type, and required tools, improving inference accuracy by selecting optimal models for different tasks. Features include tool selection optimization, PII detection, jailbreak prompt filtering, and semantic caching. Available in both Golang and Python implementations.