Entity

Causal methods for LLM development and evaluation

Large language model (LLM) development is currently driven by large-scale empirical iteration over data mixtures, reward models, routing strategies, and evaluation pipelines. Here, we argue that many central questions in LLM development and evaluation are inherently causal: What is the effect of adding a data domain during pretraining? How do annotator preferences change when LLMs generate text in a different style? Should a prompt be routed to a larger or smaller model given inference cost cons

Paper · arXiv

cs.LG

Authors: Dennis Frauen, Marie Brockschmidt, Konstantin Hess, Haorui Ma, Yuchen Ma + 8 more
Published: 2026-05-25

Abstract ↗DOI ↗

via arXiv · 2605.25998