Entity

InstructSAM: Segment Any Instance with Any Instructions

In this paper, we introduce InstructSAM, a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. We formulates instruction-driven instance segmentation as a set-structured query prediction problem and propose an explicit reasoning-to-instance query interface that elegantly bridges a vision-language model (VLM) and SAM3. Specifically, a bank of learnable instance queries is injected into the VLM and contextualized with instruction and visual info

Paper · arXiv

cs.CV

Authors: Yuqian Yuan, Wentong Li, Zhaocheng Li, Yutong Lin, Juncheng Li + 4 more
Published: 2026-05-25

Abstract ↗

via arXiv · 2605.26102