Entity

GPIC: A Giant Permissive Image Corpus for Visual Generation

Studying scalable methods for visual generative modeling requires large, accessible, and stable datasets. We introduce GPIC, a Giant Permissive Image Corpus of approximately 28 trillion pixels. GPIC comprises diverse internet images captioned by a state-of-the-art vision-language model, including 100M training, 200K validation, and 1M test examples. Moreover, all GPIC images are permissively licensed for both research and commercial use. GPIC is safety-filtered, deduplicated, and centrally hoste

Paper · arXiv

cs.CV

Authors: Keshigeyan Chandrasegaran, Kyle Sargent, Suchir Agarwal, Michael Jang, Michael Poli + 4 more
Published: 2026-05-28
Categories: cs.CVcs.AI

Abstract ↗

via arXiv · 2605.30341