Entity

Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay

Models trained on a new task typically degrade on prior tasks, a phenomenon known as forgetting. Traditionally, mitigating forgetting has required replaying stored exemplars from prior tasks, which is often impractical. By contrast, language models can sample from their own training distribution, and we show that these self-generated samples serve as effective replay data, nearly eliminating forgetting. We find that forgetting nonetheless persists when the model has little remaining capacity: mo

Paper · arXiv

cs.LG

Authors: Martin Marek, Dongkyu Cho, Shikai Qiu, Rumi Chunara, Pavel Izmailov + 1 more
Published: 2026-05-25

Abstract ↗

via arXiv · 2605.26097