Entity

Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit

We establish four structural results for feature learning in wide two-layer neural networks under the Maximal Update Parametrization ($μ$P). First, we prove global existence and uniqueness of the mean-field limit of noisy gradient descent under $μ$P, identifying the maximal admissible weight $w^*$ on the moment sequence of the initialization as the reciprocal parameter-moment-growth boundary, and hence the largest weighted moment class propagated by the flow. The finite-particle approximation ha

Paper · arXiv

cs.LG

Authors: Akmal Xodarev
Published: 2026-05-23
Categories: cs.LGmath.PRmath.STstat.ML

Abstract ↗

via arXiv · 2605.2471