Entity

Formalizing the Binding Problem

Representations of the world, arguably, contain information about features (e.g. something is blue, something is a circle) but also information about which features are part of the same object (e.g. the circle is blue), which we call binding information. Any system with the ability to understand scenes with multiple objects must be able to solve the binding problem: it needs to know which features belong together. However, despite work showing that Vision Transformers (ViTs) know which patches b

Paper · arXiv

cs.CV

Authors: Lianghuan Huang, Yihao Li, Saeed Salehi, Yingshan Chang, Ansh Soni + 1 more
Published: 2026-06-02
Categories: cs.CVcs.AIcs.LGq-bio.NC

Abstract ↗

via arXiv · 2606.03976