OSWorld is a scalable, real computer environment for multimodal agents that supports task setup, execution-based evaluation, and interactive learning across various operating systems. It serves as a unified environment for evaluating open-ended computer tasks. The benchmark on OSWorld revealed deficiencies in state-of-the-art
Sort: