LLM Compressor v0.10 introduces distributed GPTQ quantization with multi-GPU support, delivering up to 3.8x speedup on 4 GPUs. A numerical fix to Hessian calculation also yields a +4% accuracy improvement on GSM8K benchmarks. The release replaces Hugging Face Accelerate with a custom compressed-tensors offloading system supporting device, CPU, and disk offloading for models exceeding available memory. FP4 microscale quantization support (NVFP4 and MXFP4) is also added. Setup requires initializing a distributed context, using the new offload context manager, partitioning calibration data across ranks, and launching with torchrun.

Table of contents
Distributed GPTQ: Parallelize compression across multiple GPUsCustom compressed-tensors offloadingGPTQ FP4 microscale supportConclusionSort: