Databricks redesigned its vector search infrastructure to handle billion-scale datasets by decoupling storage from compute. The new Storage Optimized endpoints use IVF (Inverted File Index) instead of HNSW, distributed K-means and Product Quantization built on PySpark with JAX, and a Rust-based dual-runtime query engine separating async I/O from CPU-bound computation. Key results: billion-vector indexes built in under 8 hours (20x faster), up to 7x lower serving costs, and 90%+ recall at 1 billion vectors. Query latency is ~300–500ms versus 20–50ms for the memory-resident Standard endpoints — a deliberate trade-off favoring scale and cost over ultra-low latency. The architecture relies on three interdependent bets: storage-compute separation, distributed indexing with a compatible index format, and aggressive compression via Product Quantization.

15m read timeFrom databricks.com
Post cover image
Table of contents
IntroductionThe Problem with Traditional Vector DatabasesDecoupled by DesignDistributed Vector Indexing on Spark

Sort: