Entity

q0: Primitives for Hyper-Epoch Pretraining

Multi-epoch training is becoming the standard now that compute is growing faster than the supply of high-quality text. But pretraining a single model saturates within a few passes, long before the compute budget is exhausted. We argue this calls for a conceptual shift from training a single model toward exploring a population of models and aggregating their predictions. We introduce hyper-epoch pretraining (q0), which turns a multi-epoch budget into a population of diverse models whose combined

Paper · arXiv

cs.LG

Authors: Bishwas Mandal, Shmuel Berman, Akshay Vegesna, Samip Dahal
Published: 2026-06-02
Categories: cs.LGcs.AI

Abstract ↗

via arXiv · 2606.03938