Explore the latest release of LLM Compressor, featuring attention quantization, MXFP4 support, AutoRound quantization modifier, and more.

Rhdev is a blog and resource hub dedicated to Ruby on Rails development, a popular web application framework written in Ruby. Developers can explore tutorials, best practices, and case studies for building web applications with Ruby on Rails. Additionally, Rhdev covers topics such as ActiveRecord ORM, RESTful APIs, and frontend integration using JavaScript frameworks, offering insights for both beginners and experienced Rails developers.

Red Hat Developer

LLM Compressor 0.9.0 introduces major enhancements for model quantization, including refactored attention and KV cache quantization supporting arbitrary schemes (FP4, INT8, FP8), a new model_free_ptq pathway for quantizing models without transformer definitions, and Intel's AutoRound algorithm integration. The release adds experimental MXFP4 support with packed compression, batched calibration for up to 3x faster runtimes, and expanded AWQ support beyond W4A16. New observer types, activation quantization strategies, and improved MoE calibration enable quantization of complex architectures like Kimi-K2 and Qwen3 models while maintaining compatibility with vLLM and Hugging Face.

LLM Compressor 0.9.0: Attention quantization, MXFP4 support, and more

Refactored and expanded attention and KV cache quantization

Quantize any model to FP8 using model_free_ptq