ScreenSuite - The most comprehensive evaluation suite for GUI Agents!
This title could be clearer and more informative.Try out Clickbait Shieldfor free (5 uses left this month).
ScreenSuite is a comprehensive evaluation framework for GUI agents that unifies 13 benchmarks across perception, grounding, single-step actions, and multi-step agent capabilities. The suite evaluates vision language models on their ability to interact with graphical interfaces using only visual input, without accessibility trees or DOM metadata. It includes Dockerized environments for Ubuntu and Android testing, supports both local and remote sandbox execution, and provides standardized evaluation of leading VLMs like Qwen-2.5-VL series, UI-TARS, and GPT-4o on GUI automation tasks.
Table of contents
WTF is a GUI Agent?Introducing ScreenSuite π₯³Ranking leading VLMs on ScreenSuite πStart your custom evaluation in 30s β‘οΈNext steps πSort: