The article explores different quantization methods for Large Language Models (LLMs), including HuggingFace, sharding, and quantization with Bitsandbytes. It also introduces pre-quantized LLM formats like GPTQ, GGUF, and AWQ. Each method is explained and examples are provided on how to load and use the models. The article
Table of contents
1. HuggingFace2. Sharding3. Quantize with Bitsandbytes4. Pre-Quantization (GPTQ vs. AWQ vs. GGUF)Sort: