Most regex engines advertised as 'linear time' — including RE2, Go's regexp, and Rust's regex crate — only guarantee linear time for a single match. Finding all matches degrades to O(n²) due to the 'loop around a DFA' approach. The author presents RE#, a regex engine that solves this with a two-pass bidirectional algorithm that finds all leftmost-longest (POSIX) matches in true linear time without changing semantics. A 'hardened' mode using an O(n*S) forward scan is also available for adversarial inputs, trading 3-9x constant-factor slowdown for guaranteed linear behavior. Benchmarks show RE# is competitive with Rust's regex crate on literal-heavy patterns using AVX2/NEON intrinsics. Current limitations include no capture groups and no lazy quantifiers. The post also distinguishes RE# from REmatch (VLDB 2023), which solves the different problem of enumerating all overlapping matches.
Table of contents
true-linear all-matcheshardened modeskip accelerationstreaming semanticswhat RE# doesn't doa note on REmatchSort: