index4j is an open-source Java library from Dynatrace Research that implements the FM-Index data structure for fast, arbitrary substring search over compressed log data. The FM-Index combines compression and indexing into a single structure using suffix arrays, wavelet trees, and the Burrows–Wheeler transform. A practical walkthrough shows how to build an index over ~180 MB of Android logs (compressed to ~70 MB), run count and locate queries in microseconds, extract matching log lines, and serialize the index to disk. The library is available on Maven Central and suits static log datasets requiring frequent, unpredictable substring queries.

10m read timeFrom dynatrace.com
Post cover image
Table of contents
When should you use the FM-Index?What is the FM-Index?Let’s get into itHow do I store my index?Conclusion

Sort: