Allen Institute of AI (Ai2) has released MolmoWeb, an open-source visual web agent built on the Molmo 2 model family. Available in 4B and 8B parameter sizes, it can browse the web, fill forms, and complete tasks by interpreting screenshots — without relying on proprietary model distillation. Training data includes 30,000 human task trajectories (the largest publicly released dataset of its kind), synthetic accessibility-tree trajectories, and 2.2 million QA pairs. MolmoWeb outperforms GPT-4o and open-weight competitors like Fara-7B on standard benchmarks, though proprietary models from Anthropic and Google still lead. Weights, training data, and evaluation tools are available on Hugging Face and GitHub.

4m read timeFrom thenewstack.io
Post cover image
Table of contents
BenchmarksMolmoWeb’s training dataAvailability

Sort: