GLM-OCR is an open-source multimodal OCR model achieving state-of-the-art performance on document understanding benchmarks with only 0.9B parameters. Built on GLM-V architecture with Multi-Token Prediction loss, it excels at complex layouts including tables, formulas, and code. The project provides a comprehensive SDK
Sort: