Entity

Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning

Wasserstein policy gradient (WPG) is a policy optimization method for reinforcement learning (RL) that exploits the optimal-transport geometry of action distributions. For the entropy-regularized RL objective, WPG evolves each state-conditional policy by transporting it along the action gradient of the soft Q-function together with a Langevin-type diffusion. Despite its appeal for continuous-control problems, its global convergence properties remain poorly understood. Standard Langevin analyses

Paper · arXiv

cs.LG

Authors: Zhaoyu Zhu, Rui Gao, Shuang Li
Published: 2026-05-25

Abstract ↗

via arXiv · 2605.26078