The claim “Our vector database can handle 100 million embeddings on a single machine.” Sounds impressive, but the math doesn’t work using just naive storage. Let’s take a standard embedding: 768 dims, float32. That’s 3,072 bytes per vector. Multiply by 100 million: 307 GB. Just for the vectors. No index. No metadata. No IDs. No breathing room for the OS. Just raw floats sitting in memory.

Most machines have 64-128 GB of RAM. We’re 3-5× over budget before we’ve even started. So how do production systems actually pull this off? The answer is a systems design pattern built on one key insight: you don’t need full-precision vectors in RAM. You need just enough precision to find candidates, then you refine.

MLPills is a dedicated platform for machine learning enthusiasts, data scientists, and AI practitioners seeking  tutorials, case studies, and resources on artificial intelligence and machine learning. With a focus on practical applications and real-world examples, MLPills covers a wide range of topics, including supervised and unsupervised learning algorithms, deep learning architectures, natural language processing, and computer vision. By providing step-by-step guides and code samples, MLPills empowers learners to explore complex concepts and implement machine learning solutions in their projects and applications.

Machine Learning Pills

Vector databases can store 100 million embeddings on a single machine through Product Quantization (PQ), which compresses 768-dim float32 vectors from 307GB to ~10GB by splitting vectors into subspaces and storing codebook indices instead of raw floats. The system uses a multi-stage retrieval pipeline: IVF partitioning narrows candidates, PQ enables fast approximate distance calculations via table lookups, and optional refinement with original vectors recovers precision. A hot/cold storage pattern keeps compressed codes and indices in RAM (~15GB total) while original vectors live on SSD, making 100M-scale search feasible on commodity hardware with 64GB RAM.

RW #9 - How Vector DBs Store 100M Embeddings on One Machine (and Still Search Fast)