Entity

Nonstationary Generalized Linear Bandits with Discounted Online Mirror Descent

We study nonstationary generalized linear bandits (GLBs), where the expected reward is modeled through a nonlinear link function with an unknown time-varying parameter. This framework encompasses a broad class of reward models, including linear, Bernoulli, and binomial rewards. Existing approaches are predominantly based on maximum-likelihood estimation (MLE), using sliding-window, restart, or discounting mechanisms to handle nonstationarity. Although these methods achieve statistically efficien

Paper · arXiv

stat.ML

Authors: Joongkyu Lee, Min-hwan Oh
Published: 2026-05-25
Categories: stat.MLcs.LG

Abstract ↗

via arXiv · 2605.2559