Entity

Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss

Many modern applications of deep learning involve training a neural network via a one-step prediction loss (e.g., $L^2$ regression, cross-entropy), but deploy the network by rolling out along its own predictions. Key examples include autoregressive language modeling, flow-based generative modeling, and robot policy learning. It is well-documented that these settings induce a phenomenon we call test-time feedback (TTF): the mismatch between the training/validation loss and downstream metrics of i

Paper · arXiv

cs.LG

Authors: Thomas T. Zhang, Alok Shah, Yifei Zhang, Vincent Zhang, Nikolai Matni + 1 more
Published: 2026-06-04
Categories: cs.LGcs.AIeess.SY

Abstract ↗

via arXiv · 2606.06418