Entity

AI-Assisted Systematization for Evaluating GenAI Systems

Evaluating generative AI (GenAI) systems is challenging because many targets of evaluation are broad, contested concepts, such as "reasoning," "fairness," or "creativity." When these concepts are left underspecified, it becomes unclear what should be measured or how evaluation results should be interpreted. This problem reflects a missing step: systematization, that is, moving from a broad background concept to an explicit, structured account of the concept in measurable terms. To help address t

Paper · arXiv

cs.CL

Authors: Dhruv Agarwal, Emily Sheng, Chad Atalla, Jean Garcia-Gathright, Hussein Mozannar + 4 more
Published: 2026-05-25
Categories: cs.CLcs.AIcs.CY

Abstract ↗

via arXiv · 2605.26001