Google Research introduces TurboQuant, a theoretically grounded quantization algorithm for compressing large language model KV caches and vector search indices. TurboQuant combines two sub-algorithms: PolarQuant, which converts vectors to polar coordinates to eliminate quantization overhead, and QJL (Quantized

7m read timeFrom research.google
Post cover image

Sort: