Entity

Continual Visual and Verbal Learning Through a Child's Egocentric Input

Children learn the meanings of words from a continuous, temporally structured stream of egocentric experience. Recent work shows that neural networks can also learn word-referent mappings from a child's egocentric video recordings, but they cycle through the shuffled data for hundreds of epochs, contrasting with how children actually encounter their environment. We introduce BabyCL, a continual multimodal learning framework that processes the SAYCam dataset in a single chronological pass, combin

Paper · arXiv

cs.CV

Authors: Xiaoyang Jiang, Yanlai Yang, Kenneth A. Norman, Brenden Lake, Mengye Ren
Published: 2026-06-03
Categories: cs.CVcs.AIcs.CL

Abstract ↗

via arXiv · 2606.05115