1

From Human Videos to Robot Manipulation: A Survey on Scalable Vision-Language-Action Learning with Human-centric Data
IJCAI 2026
From Human Videos to Robot Manipulation: A Survey on Scalable Vision-Language-Action Learning with Human-centric Data
mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
RSS 2026
mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
Training Strategies for Efficient Embodied Reasoning
CoRL 2025
Training Strategies for Efficient Embodied Reasoning
FAST: Efficient Action Tokenization for Vision-Language-Action Models
RSS 2025
FAST: Efficient Action Tokenization for Vision-Language-Action Models
Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding
ICRA 2025
Beyond Sight: Finetuning Generalist Robot Policies with Heterogeneous Sensors via Language Grounding
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
ICRA 2025
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models
CoRL 2024
Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models
The Ingredients for Robotic Diffusion Transformers
ICRA 2025
The Ingredients for Robotic Diffusion Transformers
LeLaN: Learning A Language-conditioned Navigation Policy from In-the-Wild Videos
CoRL 2024
LeLaN: Learning A Language-conditioned Navigation Policy from In-the-Wild Videos