Turbopuffer's ANN v3 achieves 200ms p99 query latency at 1k QPS over 100 billion vectors (200TiB) through hierarchical clustering and binary quantization. The system balances bandwidth across the memory hierarchy by storing quantized vectors in L3 cache and DRAM while keeping full-precision vectors on NVMe SSDs. Binary

17m read time From turbopuffer.com
Post cover image
Table of contents
Billion-scale ANN searchPutting it all together......to end up compute-bound......in a distributed clusterConclusionturbopuffer

Sort: