Entity

Using Reward Uncertainty to Induce Diverse Behaviour in Reinforcement Learning

Classical reinforcement learning (RL) typically seeks a deterministic policy that maximizes the expected sum of a scalar reward. Yet, modern applications such as language model fine-tuning or scientific discovery demand diversity. Existing remedies such as entropy regularization or diversity bonuses often require fragile trade-offs that sacrifice performance for stochasticity or rely on heuristic metrics that can misalign policy rankings. We argue that diversity is more naturally understood as t

Paper · arXiv

cs.LG

Authors: Anthony GX-Chen, Ankit Anand, Gheorghe Comanici, Zaheer Abbas, Eser Aygün + 5 more
Published: 2026-06-02
Categories: cs.LGcs.AI

Abstract ↗

via arXiv · 2606.03962