Entity

Reasoning with Sampling: Cutting at Decision Points

Frontier reasoning models are produced by posttraining base language models with reinforcement learning. Recent work has challenged this by showing that sampling from a sharpened version of the base model's distribution, a so-called power distribution, elicits comparable reasoning without additional training, curated datasets, or verifiers. However, making this method practical requires efficiently sampling from the power distribution. A sampler needs to "mix" to the power distribution, which ne

Paper · arXiv

cs.LG

Authors: Felix Zhou, Anay Mehrotra, Quanquan C. Liu
Published: 2026-05-28
Categories: cs.LGcs.AIcs.CLmath.STstat.ML

Abstract ↗

via arXiv · 2605.30327