Measuring Massive Multitask Language Understanding (MMLU) is a popular benchmark for evaluating the capabilities of large language models. It inspired several other versions and spin-offs, such as MMLU-Pro, MMMLU and MMLU-Redux.
Measuring Massive Multitask Language Understanding (MMLU) is a popular benchmark for evaluating the capabilities of large language models. It inspired several other versions and spin-offs, such as MMLU-Pro, MMMLU and MMLU-Redux.
==Overview== MMLU consists of 15,908 multiple-choice questions, with 1,540 of them being used to select and assess optimal settings for models – temperature, batch size and learning rate. The questions span across 57 subjects, from highly complex STEM fields and international law, to nutrition and religion. It was one of the most commonly used benchmarks for comparing the capabilities of large language models, with over 100 million downloads as of July 2024.
Discovered by embedding cosine similarity (sentence-transformers MiniLM, 384-dim).