Entity

Unlocking the Working Memory of Large Language Models for Latent Reasoning

To improve the reasoning capabilities of large language models, test-time compute is typically scaled by generating intermediate tokens before the final answer. However, this couples reasoning to autoregressive generation and thereby conflates internal computation with external communication. In contrast, human cognition can use working memory to hold and manipulate information internally without the need to externalize intermediate thoughts. Drawing on this principle, we introduce Reasoning in

Paper · arXiv

cs.CL

Authors: Lukas Aichberger, Sepp Hochreiter
Published: 2026-05-28
Categories: cs.CLcs.AI

Abstract ↗

via arXiv · 2605.30343