UI-TARS Desktop, developed by Bytedance, is a GUI Agent application that uses the Vision-Language Model to allow natural language control of a computer. It offers features like cross-platform support, screenshot and visual recognition, real-time feedback, and privacy through local processing. A technical preview of a new desktop app, Agent TARS, has been released, which integrates browser operations with command lines and file systems.

2m read timeFrom github.com
Post cover image
Table of contents
ShowcasesNewsFeaturesQuick StartDeploymentContributingSDK (Experimental)LicenseCitation

Sort: