Skip to content
How Neural Reward Models Learn Features for Policy Optimization: A Single-Index Analysis · Vinony