Introducing Atom, a low-bit quantization technique for efficient and accurate Large Language Model (LLM) serving. Atom maximizes serving throughput of LLMs by using low-bit operators and quantization to reduce memory usage without sacrificing precision. It achieves up to 7.73 times improvement in end-to-end throughput compared to 16-bit floating-point (FP16) approach and 2.53 times improvement compared to 8-bit integer (INT8) quantization.
Sort: