Entity

Rethinking VLM Representation for VLA Initialization

Vision-Language-Action (VLA) models widely adopt pretrained Vision-Language Models (VLMs) as policy backbones, yet it remains unclear what kind of pretrained VLM representation is useful as a VLA initialization. In this paper, we study VLA initialization as a controlled representation-design problem along three axes: capability-level embodied VQA supervision, parameter-update strategy, and robot-data pretraining. Our experiments show that the original pretrained VLM representation is a key sourc

Paper · arXiv

cs.CV

Authors: Weifeng Lin, Siyuan Huang, Hao Li, Tingwei Chen, Ruichuan An + 3 more
Published: 2026-05-25

Abstract ↗

via arXiv · 2605.25802