Entity

MuCRASP: Multimodal Chain-of-thought Reasoning aware Structured Pruning

Vision-language models (VLMs) increasingly rely on chain-of-thought (CoT) reasoning to solve complex multimodal tasks, but their large parameter sizes make deployment expensive. Structured pruning offers a natural solution; however, existing methods fail to preserve CoT reasoning accuracy in VLMs. We identify two key reasons: (1) CoT consistency depends on sparse transition points (pivot tokens) in the generation trajectory, while existing pruning methods are CoT-agnostic; and (2) pruning method

Paper · arXiv

cs.AI

Authors: Aritra Dutta, Somak Aditya
Published: 2026-05-25
Categories: cs.AIcs.CL

Abstract ↗

via arXiv · 2605.25842