Ego-Pi:

VLA Fine-Tuning for Ego-Centric Human and Robot Data

Stanford Logo Meta Logo

Summary video. Sound on!

Authors

Ji Woong (Brian) Kim1*, Ke Wang1*, Sirui Chen1, Zipeng Fu1, Cong Zhao2, Jeff Lai2, Chelsea Finn1,


1Stanford University, 2Meta

Project Diagram

Abstract

Robotics faces a fundamental challenge of data scarcity. Unlike language or vision research, there is no internet-scale dataset for robotic manipulation. A promising path forward is to leverage egocentric human data, which can be collected more easily, with greater breadth, and at a larger scale. Towards this end, we investigate key design choices for learning across human and humanoid embodiments equipped with dexterous five-finger hands, using the Pi0.5 model as a foundation. Our results show that human data enables robots to learn new task semantics and compose existing skills into novel behaviors without corresponding robot data.

Autonomous Tasks

(policies co-trained with human data)

Sorting 40 tomatoes in a row, 38 / 40 success rate
4x speed

Packaging 10 trials in a row, 9 / 10 success rate, policy trained with subtask generation.
4x speed

Boxing 15 trials in a row, 14 / 15 success rate, policy trained with subtask generation.
4x speed

Baseline Performance

(policies trained only using robot data)

Sorting 40 tomatoes in a row, 16 / 40 success rate. The robot has no concept of sorting and places the tomatoes randomly. The concept of sorting only appears in human data, which this baseline model was not trained on.
4x speed

Packaging 10 trials in a row, 1 / 10 success rate. The policy fails to learn the rule of placing the black box at the bottom first. This rule-based ordering concept only appears in human data, which this baseline model was not trained on.
4x speed

Boxing 15 trials in a row, 4 / 15 success rate. The policy often attempts to open the box and reach for the block at the same time, without waiting for the box to be fully open. The sequential ordering of these skills only appears in human data, which this baseline model was not trained on.
4x speed

BibTeX

@inproceedings{,
  author    = {},
  title     = {},
  year      = {},
}