vLLM

GLM-4.5 and GLM-4.5V are new foundation models designed for intelligent agents, featuring hybrid reasoning capabilities with thinking and non-thinking modes. GLM-4.5 has 355B total parameters with 32B active, while GLM-4.5-Air uses 106B total with 12B active. GLM-4.5V adds vision capabilities including object grounding with bounding box detection. Both models are now supported in vLLM for accelerated inference on NVIDIA GPUs, with specific installation steps and configuration options for optimal performance.

GLM-4.5 Meets vLLM: Built for Intelligent Agents