Designing a multi-robot training data format

mediumsubjectivesystem design

General

You are building the data infrastructure for a company that operates several different robot embodiments (a 6-DoF arm, a bimanual setup, and a mobile manipulator), each logging raw data as ROS2 bags. The ML team wants to train a single cross-embodiment policy. Explain how you would turn this heterogeneous raw data into a standardized training dataset. Address: (1) why raw ROS2 bags are not directly suitable as a training format, (2) what a standardized format like RLDS or LeRobotDataset gives you, and (3) at least two hard problems that arise when unifying data across different embodiments.

Your answer