unknown

Tesseract is an open-source OCR (Optical Character Recognition) engine that can extract text from images. It supports over 100 languages, multiple image formats (PNG, JPEG, TIFF), and various output formats including plain text, PDF, and HTML. The current version 5 includes both a neural network-based LSTM engine for line recognition and a legacy character pattern recognition engine. Originally developed by HP and later maintained by Google, it's now community-maintained and provides both command-line tools and C/C++ APIs for developers.

tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository)

🚀 Daily Open Source Tools Squad delivers daily articles, open-source projects, and technical resources for Developers & DevOps pros. Stay updated with the latest trends & insights!

📬 Join our mailing list! → theinfinity.dev