Google Research introduces TurboQuant, a theoretically grounded quantization algorithm for compressing large language model KV caches and vector search indices. TurboQuant combines two sub-algorithms: PolarQuant, which converts vectors to polar coordinates to eliminate quantization overhead, and QJL (Quantized
Sort: