In this talk, discover a modular approach to document understanding using state-of-the-art models and Python tools. Learn to convert PDFs to structured data, build custom information extraction pipelines, and use OCR for image-based text. Practical examples feature spaCy and the new Docling library.

2m read time From speakerdeck.com
Post cover image

Sort: