The best agents fail at enterprise documents: not because they can't reason, but because they can't read. We’re announcing Document Intelligence: powerful state-of-the-art research turning your unstructured enterprise documents into agent-ready data, at scale.

databricks

Databricks is announcing Document Intelligence, a platform capability designed to solve a core bottleneck in enterprise AI agents: the inability to accurately read real-world documents like scanned PDFs, contracts, invoices, and medical notes. Even frontier models score below 50% on document reasoning benchmarks (OfficeQA) not due to poor reasoning but poor document parsing. The solution introduces composable AI Functions — ai_parse_document (GA), ai_classify, and ai_extract — that form a reusable pipeline. Benchmarks show a 16% average agent performance gain when using ai_parse_document as a preprocessing step, and 5–7x lower cost compared to VLM-based pipelines. The system runs on serverless batch infrastructure inside Databricks, replacing fragile multi-vendor OCR/extraction stacks with a single governed workflow.

Why Your Agents Can’t Read Enterprise Documents — and How to Fix It

Improving agent quality on real-world, enterprise documents

Unlocking document intelligence at enterprise scale