GLM-OCR is an open-source multimodal OCR model achieving state-of-the-art performance on document understanding benchmarks with only 0.9B parameters. Built on GLM-V architecture with Multi-Token Prediction loss, it excels at complex layouts including tables, formulas, and code. The project provides a comprehensive SDK

7m read timeFrom github.com
Post cover image
Table of contents
GLM-OCRGLM-OCR SDKAcknowledgementLicense

Sort: