Google has introduced TurboQuant, a new quantization method targeting two major memory bottlenecks in AI systems: the key-value (KV) cache used during LLM inference and vector search operations. In tests on Gemma and Mistral models running on Nvidia H100 hardware, Google reported a 6x reduction in memory usage and an 8x speedup
Sort: