Allen Institute of AI (Ai2) has released MolmoWeb, an open-source visual web agent built on the Molmo 2 model family. Available in 4B and 8B parameter sizes, it can browse the web, fill forms, and complete tasks by interpreting screenshots — without relying on proprietary model distillation. Training data includes 30,000 human task trajectories (the largest publicly released dataset of its kind), synthetic accessibility-tree trajectories, and 2.2 million QA pairs. MolmoWeb outperforms GPT-4o and open-weight competitors like Fara-7B on standard benchmarks, though proprietary models from Anthropic and Google still lead. Weights, training data, and evaluation tools are available on Hugging Face and GitHub.
Sort: