Researchmultimodal llmgui agentson devicesynthetic data
Ferret-UI Lite Matches Larger GUI Agents
9.1
Relevance Score
A recent Apple study introduces Ferret-UI Lite, a 3-billion-parameter multimodal model that matches or surpasses GUI-agent benchmarks against models up to 24 times larger. It uses inference-time cropping and zooming, supervised fine-tuning, reinforcement learning, and a multi-agent synthetic data pipeline to run on-device across Android, web, and desktop, though it performs weaker on complex multi-step interactions.

