OSWorld is a scalable, real computer environment for multimodal agents that supports task setup, execution-based evaluation, and interactive learning across various operating systems. It serves as a unified environment for evaluating open-ended computer tasks. The benchmark on OSWorld revealed deficiencies in state-of-the-art

1m read timeFrom os-world.github.io
Post cover image

Sort: