Entity

Forget Attention: Importance-Aware Attention Is All You Need

Combining attention's global retrieval with the sequential importance signal of state space models (SSMs) is the open challenge of hybrid language modeling. Transformers see everywhere but cannot prioritize; SSMs know what matters but cannot revisit. Existing hybrids -- Jamba (block level) and Hymba (head level) -- place the two in separate compartments, so neither informs the other during the attention computation itself. We propose SISA (SSM-Informed Softmax Attention), which adds an SSM-deriv

Paper · arXiv

cs.AI

Authors: Soohyeong Shin, Yeongwook Yang
Published: 2026-06-01
Categories: cs.AIcs.CLcs.LG

Abstract ↗

via arXiv · 2606.02332