Introduction of BiLLM, a novel post-training binary quantization method for compressing pre-trained LLMs. BiLLM achieves ultra-low bit quantization without significant loss of precision, enabling deployment in edge scenarios and resource-constrained devices.
Sort: