LLM Compressor v0.10 introduces Distributed Data Parallel (DDP) for faster compression, memory management, and advanced quantization formats. Make model compression workflows more efficient for large language models.

Rhdev is a blog and resource hub dedicated to Ruby on Rails development, a popular web application framework written in Ruby. Developers can explore tutorials, best practices, and case studies for building web applications with Ruby on Rails. Additionally, Rhdev covers topics such as ActiveRecord ORM, RESTful APIs, and frontend integration using JavaScript frameworks, offering insights for both beginners and experienced Rails developers.

Red Hat Developer

LLM Compressor v0.10 introduces distributed GPTQ quantization with multi-GPU support, delivering up to 3.8x speedup on 4 GPUs. A numerical fix to Hessian calculation also yields a +4% accuracy improvement on GSM8K benchmarks. The release replaces Hugging Face Accelerate with a custom compressed-tensors offloading system supporting device, CPU, and disk offloading for models exceeding available memory. FP4 microscale quantization support (NVFP4 and MXFP4) is also added. Setup requires initializing a distributed context, using the new offload context manager, partitioning calibration data across ranks, and launching with torchrun.

LLM Compressor v0.10: Faster compression with distributed GPTQ

Distributed GPTQ: Parallelize compression across multiple GPUs