Datalab's Marker and OCR models are now available on Replicate for document parsing and text extraction. Marker converts PDFs, DOCX, PPTX, and images into markdown or JSON, handling tables, math, code, and structured data extraction via JSON schemas. OCR detects text in 90 languages and returns reading order and table grids. Both models outperform established tools like Tesseract and GPT-4o, with Marker processing pages in 0.18 seconds and achieving 82.7% accuracy on olmOCR-Bench. Pricing starts at $2-6 per 1000 pages depending on mode and features.

3m read timeFrom replicate.com
Post cover image
Table of contents
Run MarkerRun OCRStructured extractionPerformancePricing

Sort: