Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Google Research introduces TurboQuant, a theoretically grounded quantization algorithm for compressing large language model KV caches and vector search indices. TurboQuant combines two sub-algorithms: PolarQuant, which converts vectors to polar coordinates to eliminate quantization overhead, and QJL (Quantized Johnson-Lindenstrauss), a 1-bit error-correction step with zero memory overhead. Together they achieve 6x+ KV memory reduction with no accuracy loss, no fine-tuning required, and up to 8x attention computation speedup over 32-bit unquantized keys on H100 GPUs. Benchmarks on Gemma and Mistral across LongBench, Needle In A Haystack, and other long-context tasks show near-lossless performance. The work also outperforms state-of-the-art vector search baselines (PQ, RabbiQ) in recall without dataset-specific tuning.

TurboQuant: Redefining AI efficiency with extreme compression