vLLM
vLLM Triton Attention Backend Deep DiveBeyond Porting: How vLLM Orchestrates High-Performance Inference on AMD ROCmDeepSeek-V3.2 on GB300: Performance BreakthroughDriving vLLM WideEP and Large-Scale Serving Toward Maturity on Blackwell (Part I)GPT-OSS Performance Optimizations on NVIDIA Blackwell: Pushing the Pareto FrontierBuilding Mixture-of-Models on AMD GPUs with vLLM-SRInside vLLM’s New KV Offloading Connector: Smarter Memory Transfer for Maximizing Inference ThroughputvLLM Semantic Router v0.1 Iris: The First Major ReleaseIntroducing vLLM Playground: A Modern Web Interface for Managing and Interacting with vLLM ServersAnnouncing vllm.ai Website and Some Community Updates
All posts from vLLM