Vector databases can store 100 million embeddings on a single machine through Product Quantization (PQ), which compresses 768-dim float32 vectors from 307GB to ~10GB by splitting vectors into subspaces and storing codebook indices instead of raw floats. The system uses a multi-stage retrieval pipeline: IVF partitioning narrows candidates, PQ enables fast approximate distance calculations via table lookups, and optional refinement with original vectors recovers precision. A hot/cold storage pattern keeps compressed codes and indices in RAM (~15GB total) while original vectors live on SSD, making 100M-scale search feasible on commodity hardware with 64GB RAM.
Sort: