Entity

$D^2$-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing

Despite the emergence of diffusion large language models (D-LLMs) as an alternative to autoregressive large language models (AR-LLMs), safety monitoring for D-LLMs remains largely unexplored. Unlike AR-LLMs, D-LLMs generate text through a multi-step denoising process, exposing intermediate hidden representations that may contain safety-relevant information unavailable in standard single-step monitoring setups. Motivated by the suitability of lightweight probes for always-on monitoring, we analyz

Paper · arXiv

cs.AI

Authors: Aoxi Liu, Yupeng Chen, James Oldfield, Guanzhe Hong, Junchi Yu + 3 more
Published: 2026-05-25

Abstract ↗

via arXiv · 2605.25893