OSWorld is a scalable, real computer environment for multimodal agents that supports task setup, execution-based evaluation, and interactive learning across various operating systems. It serves as a unified environment for evaluating open-ended computer tasks. The benchmark on OSWorld revealed deficiencies in state-of-the-art LLM/VLM-based agents. Factors influencing the performance of VLMs in digital agent tasks include task attributes, input measurements, and UI layout.

1m read timeFrom os-world.github.io
Post cover image

Sort: