Meilisearch's CTO details how they patched LMDB to support nested read transactions on uncommitted writes, eliminating full database scans in their HNSW-based vector store. The previous approach used a full scan to collect memory-mapped pointers into a HashMap for multi-threaded access. Two key optimizations were made: removing unnecessary mean-degree stat computation (cutting 559s to ~2-3s), and replacing the ImmutableItems/ImmutableLinks full-scan approach with a new FrozenReader that lazily fetches graph nodes via nested read transactions distributed across rayon threads using thread-local storage. The LMDB patch required implementing an atomically reference-counted system in C11 to allow multiple concurrent nested read-only transactions from a parent write transaction. The result: inserting embeddings went from ~6/sec to ~20/sec, a 3x+ improvement, while also improving disk cache efficiency by loading only necessary pages.

16m read timeFrom meilisearch.com
Post cover image
Table of contents
Reminder of the factsStop computing useless statsFull scans are harmfulBut what's the new trick?Results: over 3x faster

Sort: