Google Research has unveiled TurboQuant, a quantization algorithm that compresses LLM Key-Value caches by up to 6x using a two-step approach: a randomized Hadamard transform to normalize value distributions, followed by the Quantized Johnson-Lindenstrauss (QJL) transform to remove bias. At 3.5-bit compression, it matches 16-bit

4m read timeFrom infoq.com
Post cover image

Sort: