Entity

On the Limits of Model Merging for Multilinguality in Pre-Training

Endowing models with consistent multilingual performance can be achieved by mixing pre-training data, or post-training approaches such as language-specific model merging. In this work, we test whether merging can be applied to monolingually pre-trained models. We conduct a controlled study on the efficacy of mixed, merged, and monolingual pre-training setups. We find that while monolingual pre-training results in strong in-language performance, merging any combination of monolingual models leads

Paper · arXiv

cs.CL

Authors: Seth Aycock, Fedor Vitiugin, Aleksandr Umnov, Christof Monz, Khalil Sima'an
Published: 2026-05-25

Abstract ↗

via arXiv · 2605.25846