Entity

DragOn: A Benchmark and Dataset for Drag-Based GUI Interactions

GUI agents - vision-based models that control desktops, web browsers, and mobile devices through graphical user interfaces - promise to automate a wide range of digital tasks. While million-scale datasets have enabled substantial progress on click-grounding, drag grounding (e.g. drag-and-drop, swipe, highlight) data remains an order of magnitude smaller and current models fall short on complex drag-based interactions. We introduce DragOn, a drag grounding benchmark and training dataset covering

Paper · arXiv

cs.AI

Authors: Nathan Bout, Maxime Langevin, Ronan Riochet
Published: 2026-06-04

Abstract ↗

via arXiv · 2606.06322