AutoRound is Intel's advanced post-training quantization tool for large language and vision-language models, designed to reduce model size and inference latency while maintaining high accuracy. It utilizes signed gradient descent to optimize weight rounding and clipping ranges for low-bit quantization (e.g., INT2 - INT8) with

6m read timeFrom huggingface.co
Post cover image
Table of contents
Superior Accuracy at Low Bit Widths2. Broad CompatibilityInstallationQuantization and SerializationInference

Sort: