In this talk, discover a modular approach to document understanding using state-of-the-art models and Python tools. Learn to convert PDFs to structured data, build custom information extraction pipelines, and use OCR for image-based text. Practical examples feature spaCy and the new Docling library.
•2m read time• From speakerdeck.com
Sort: