Okapi, or “What if ripgrep Could Edit?”

This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).

Okapi is a command-line tool built on top of ripgrep that enables bulk editing of OCR errors (scannos) across thousands of text files. The author built it while digitizing 150+ years of US Government employee records, where OCR tools like olmOCR still produce many character-level errors. Okapi aggregates regex matches from multiple files into a single editable buffer (inspired by git interactive rebase), allowing multi-select editing in Sublime Text. A companion Rust tool uses Tesseract bounding boxes and fuzzy trigram matching to display the original scanned image inline next to each matched line, providing visual ground truth for ambiguous corrections.

5m read timeFrom kocharhook.com
Post cover image
Table of contents
Double-U Double-U III #Text, Meet Image #Feedback welcome #

Sort: