Entity

Looped Diffusion Language Models

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer architectures for MDMs remains underexplored. In this paper, we show that selectively looping the early-middle transformer layers significantly improves both training efficiency and model performance in MDMs. We call this approach LoopMDM(Looped Masked Diffusion Model), which brings two key benefits: looping layers at training-time yields

Paper · arXiv

cs.LG

Authors: Sanghyun Lee, Chunsan Hong, Seungryong Kim, Jonghyun Lee, Jongho Park + 1 more
Published: 2026-05-25

Abstract ↗

via arXiv · 2605.26106