Our latest ANN release supports scales of 100+ billion vectors in a single search index, with 200ms p99 query latency at 1k QPS and 92% recall.

Hacker News is a community-driven platform for sharing and discussing technology news, startups, and programming-related topics. Through user submissions and comments, Hacker News offers insights into emerging technology trends, industry developments, and entrepreneurial ventures. Readers can participate in discussions, share their insights, and stay informed about the latest advancements in technology and innovation.

Hacker News

Turbopuffer's ANN v3 achieves 200ms p99 query latency at 1k QPS over 100 billion vectors (200TiB) through hierarchical clustering and binary quantization. The system balances bandwidth across the memory hierarchy by storing quantized vectors in L3 cache and DRAM while keeping full-precision vectors on NVMe SSDs. Binary quantization with RaBitQ provides 16-32x compression while maintaining 92% recall through selective reranking of <1% of candidates. The architecture distributes data across storage-dense VMs, with each machine optimized to maximize single-core compute efficiency using AVX-512 instructions. This approach scales linearly while keeping costs production-viable.

ANN v3: 200ms p99 query latency over 100 billion vectors