Microsoft's BitNet b1.58 paper proposes a novel LLM architecture where every model weight is constrained to ternary values (-1, 0, or 1), requiring only 1.58 bits per weight. Unlike post-training quantization, the model is trained from scratch using absolute mean quantization via a BitLinear layer. This eliminates matrix
•6m watch time
Sort: