Bloom filters are probabilistic data structures that efficiently test set membership using bit vectors and hash functions. They can definitively say an element is not in a set, but can only indicate an element might be present due to potential false positives. The article explains implementation details, optimal sizing formulas, hash function selection, and includes examples from production systems like Chromium, Apache Spark, and RocksDB.

6m read timeFrom llimllib.github.io
Post cover image
Table of contents
Hash FunctionsHow big should I make my Bloom filter?How many hash functions should I use?How fast and space efficient is a Bloom filter?What can I use them for?References

Sort: