AutoRound is Intel's advanced post-training quantization tool for large language and vision-language models, designed to reduce model size and inference latency while maintaining high accuracy. It utilizes signed gradient descent to optimize weight rounding and clipping ranges for low-bit quantization (e.g., INT2 - INT8) with
Table of contents
Superior Accuracy at Low Bit Widths2. Broad CompatibilityInstallationQuantization and SerializationInferenceSort: