Entity

Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)

When post-trained language models fail on reasoning problems, the common test-time-scaling response is to spend more compute on additional attempts, and the failed traces play no further role. We argue this discards a crucial signal; some failures come from unlucky sampling, where more rollouts help, while others are structural and resist resampling regardless of budget. We propose that failed traces encode recoverability structure: the inference-time signature of which test-time interventions c

Paper · arXiv

cs.LG

Authors: Nizar Islah, Istabrak Abbes, Irina Rish, Sarath Chandar, Eilif B. Muller
Published: 2026-06-03
Categories: cs.LGcs.AIcs.CL

Abstract ↗

via arXiv · 2606.05145