How Databricks separated storage from compute for 20x faster indexing and 7x lower serving costs.

databricks

Databricks redesigned its vector search infrastructure to handle billion-scale datasets by decoupling storage from compute. The new Storage Optimized endpoints use IVF (Inverted File Index) instead of HNSW, distributed K-means and Product Quantization built on PySpark with JAX, and a Rust-based dual-runtime query engine separating async I/O from CPU-bound computation. Key results: billion-vector indexes built in under 8 hours (20x faster), up to 7x lower serving costs, and 90%+ recall at 1 billion vectors. Query latency is ~300–500ms versus 20–50ms for the memory-resident Standard endpoints — a deliberate trade-off favoring scale and cost over ultra-low latency. The architecture relies on three interdependent bets: storage-compute separation, distributed indexing with a compatible index format, and aggressive compression via Product Quantization.

Decoupled by Design: Billion-Scale Vector Search

The Problem with Traditional Vector Databases