Summary video. Sound on!
Ji Woong (Brian) Kim1*, Ke Wang1*, Sirui Chen1, Zipeng Fu1, Cong Zhao2, Jeff Lai2, Chelsea Finn1,
1Stanford University, 2Meta
Robotics faces a fundamental challenge of data scarcity. Unlike language or vision research, there is no internet-scale dataset for robotic manipulation. A promising path forward is to leverage egocentric human data, which can be collected more easily, with greater breadth, and at a larger scale. Towards this end, we investigate key design choices for learning across human and humanoid embodiments equipped with dexterous five-finger hands, using the Pi0.5 model as a foundation. Our results show that human data enables robots to learn new task semantics and compose existing skills into novel behaviors without corresponding robot data.
(policies co-trained with human data)
Sorting 40 tomatoes in a row, 38 / 40 success rate
4x speed
Packaging 10 trials in a row, 9 / 10 success rate, policy trained with subtask generation.
4x speed
Boxing 15 trials in a row, 14 / 15 success rate, policy trained with subtask generation.
4x speed
(policies trained only using robot data)
Sorting 40 tomatoes in a row, 16 / 40 success rate. The robot has no concept of sorting and places the tomatoes randomly.
The concept of sorting only appears in human data, which this baseline model was not trained on.
4x speed
Packaging 10 trials in a row, 1 / 10 success rate. The policy fails to learn the rule of
placing the black box at the bottom first. This rule-based ordering concept only appears in human data, which this baseline model was not trained on.
4x speed
Boxing 15 trials in a row, 4 / 15 success rate. The policy often attempts to open the box and reach for the block at the same time, without waiting
for the box to be fully open. The sequential ordering of these skills only appears in human data, which this baseline model was not trained on.
4x speed
@inproceedings{,
author = {},
title = {},
year = {},
}