GLM-4.5 and GLM-4.5V are new foundation models designed for intelligent agents, featuring hybrid reasoning capabilities with thinking and non-thinking modes. GLM-4.5 has 355B total parameters with 32B active, while GLM-4.5-Air uses 106B total with 12B active. GLM-4.5V adds vision capabilities including object grounding with bounding box detection. Both models are now supported in vLLM for accelerated inference on NVIDIA GPUs, with specific installation steps and configuration options for optimal performance.

4m read timeFrom blog.vllm.ai
Post cover image
Table of contents
IntroductionInstallationUsageCooperation with vLLM and GLM TeamAcknowledgement

Sort: