AutoRound is Intel's advanced post-training quantization tool for large language and vision-language models, designed to reduce model size and inference latency while maintaining high accuracy. It utilizes signed gradient descent to optimize weight rounding and clipping ranges for low-bit quantization (e.g., INT2 - INT8) with minimal accuracy loss. The tool supports a variety of model architectures and devices, and offers fast quantization processes with just a small calibration dataset needed. AutoRound is compatible with popular export formats and provides flexibility in quantization configurations.

6m read timeFrom huggingface.co
Post cover image
Table of contents
Superior Accuracy at Low Bit Widths2. Broad CompatibilityInstallationQuantization and SerializationInference

Sort: