Neural PDE solvers have followed the scaling trajectory of vision and language, with recent foundation models reaching billions of parameters. We argue that scale is a poor substitute for architectural inductive bias in this domain: structured priors deliver outsized parameter efficiency, and the pattern of where they succeed and fail is itself informative about what they capture. We instantiate this argument in WaveLiT, an architecture combining a discrete wavelet transform for lossless multi-r
via arXiv · 2605.25949