AutoRound is Intel's advanced post-training quantization tool for large language and vision-language models, designed to reduce model size and inference latency while maintaining high accuracy. It utilizes signed gradient descent to optimize weight rounding and clipping ranges for low-bit quantization (e.g., INT2 - INT8) with minimal accuracy loss. The tool supports a variety of model architectures and devices, and offers fast quantization processes with just a small calibration dataset needed. AutoRound is compatible with popular export formats and provides flexibility in quantization configurations.
Table of contents
Superior Accuracy at Low Bit Widths2. Broad CompatibilityInstallationQuantization and SerializationInferenceSort: