Apple's Ferret-UI Lite is a 3B-parameter model optimized for mobile and desktop screens, designed to interpret screen images, understand UI elements such as icons and text, and interact with apps by,

InfoQ is a leading online platform for software developers, architects, and technical leaders, providing news, articles, presentations, and interviews on a wide range of topics, including agile practices, DevOps, microservices, and emerging technologies. With a focus on quality content and expert insights, InfoQ helps professionals stay informed about the latest trends, best practices, and industry developments. Developers can learn from real-world experiences, gain  knowledge, and connect with peers in the global software community through InfoQ's diverse and engaging content.

InfoQ

Apple researchers have introduced Ferret-UI Lite, a compact 3B-parameter multimodal model designed to run on-device and interact with graphical user interfaces across mobile, web, and desktop platforms. Unlike existing GUI agents that rely on large foundation models like GPT or Gemini, Ferret-UI Lite prioritizes low latency, privacy, and offline capability. The model uses screen image cropping, chain-of-thought reasoning, and a two-stage training pipeline combining supervised fine-tuning and reinforcement learning with verifiable rewards (RLVR). It achieves 91.6% on the ScreenSpot-V2 GUI grounding benchmark, competitive with much larger models. Limitations include struggles with long-horizon multi-step tasks and sensitivity to reward design. The model could enable Apple to reduce Siri's dependence on Google Cloud.

Apple Researchers Introduce Ferret-UI Lite, an On-Device AI Model for Seeing and Controlling UIs