Cursor's engineering team details how they built a local regex search index to replace slow ripgrep invocations in large monorepos. The post covers the evolution of text indexing techniques: classic trigram inverted indexes (as used in google/codesearch and zoekt), suffix arrays (livegrep), probabilistic bloom-filter-augmented

23m read timeFrom cursor.com
Post cover image
Table of contents
# The classic algorithm# Suffix Arrays: a detour# Trigram Queries with Probabilistic Masks# Sparse N-grams: Smarter Trigram Selection# All this, in your machine# Conclusions

Sort: