vLLM Semantic Router v0.2 Athena is a major release that rebuilds the model stack around mmBERT-Embed-32K-2D-Matryoshka and a new multimodal embedding model supporting text, image, and audio in a shared 384d space. Key additions include: ClawOS, an experimental orchestration layer for managing multiple OpenClaw multi-agent systems via routing, memory, and chat-driven team management; 13 model selection algorithms (KNN, SVM, MLP, Elo, AutoMix, Thompson Sampling, etc.) as first-class routing primitives; hybrid memory search combining vector similarity, BM25, and n-gram matching; NLP-based prompt compression for long-context signal extraction; AMD ROCm as a canonical deployment path with CK Flash Attention achieving 3.3x speedups and enabling 32K-token inference that previously caused OOM errors; a programmable neural-symbolic DSL for routing policy; and a zero-config dashboard-first onboarding flow. Benchmarks on AMD MI300X show ONNX+GPU end-to-end latency of 22ms vs 853ms on CPU for ~500-token requests.
Table of contents
Why Athena?What's New in v0.2 Athena?Looking Ahead: Beyond AthenaAcknowledgmentsGet StartedSort: