Kreuzberg is a Python library designed for seamless text extraction from various document formats including PDFs, images, and office documents. It emphasizes local processing, minimal dependencies, and modern async applications. Key features include support for multiple document formats, both async and sync APIs, and efficient batch processing. Installation requires Pandoc and Tesseract OCR. The library is open-source and welcomes contributions.

8m read timeFrom github.com
Post cover image
Table of contents
Why Kreuzberg?InstallationArchitectureUsageContributionLicense

Sort: