LLM Compressor 0.9.0 introduces major enhancements for model quantization, including refactored attention and KV cache quantization supporting arbitrary schemes (FP4, INT8, FP8), a new model_free_ptq pathway for quantizing models without transformer definitions, and Intel's AutoRound algorithm integration. The release adds

Table of contents
Refactored and expanded attention and KV cache quantizationQuantize any model to FP8 using model_free_ptqIntroducing the AutoRoundModifierExperimental MXFP4 supportBatched calibration supportAWQ updates and other improvementsOther updates and improvementsConclusionSort: