Principle:Haosulab ManiSkill LeRobot Format Export
| Field | Value |
|---|---|
| Source Repository | haosulab/ManiSkill |
| Domains | Imitation_Learning, Robotics, Data_Processing, Interoperability |
| Last Updated | 2026-02-15 |
Overview
Description
LeRobot Format Export is the process of converting ManiSkill trajectory data from its native HDF5 format into the LeRobot v3.0 dataset format, enabling cross-framework compatibility between ManiSkill's simulation-based data collection and HuggingFace's LeRobot ecosystem for real robot learning. This conversion bridges the gap between simulation and real-world robotics workflows by standardizing the data representation.
The LeRobot v3.0 format is a structured directory layout designed for robot learning datasets. It organizes data into:
- Parquet data files: Tabular data containing actions, robot states, timestamps, frame indices, episode indices, and task labels, stored in chunked Parquet files for efficient access.
- Video chunks: Camera observations stored as MP4 video files, organized by camera name, chunk index, and episode index. This is more storage-efficient than storing raw image arrays.
- Metadata files: JSON and Parquet files containing dataset information (feature schemas, robot type, FPS, total episodes/frames), per-episode statistics, task descriptions, and global data statistics.
The conversion process reads the ManiSkill HDF5 trajectory file, extracts actions, robot joint states (qpos), and RGB camera images from each episode, writes tabular data to Parquet files, encodes camera frames as MP4 videos, computes per-episode and global statistics, and generates the required metadata files.
Usage
LeRobot format export is used when transferring ManiSkill simulation data to the LeRobot ecosystem for:
- Training real-robot policies using LeRobot's training infrastructure (e.g., ACT, Diffusion Policy, TDMPC).
- Uploading datasets to HuggingFace Hub for sharing and reproducibility.
- Combining simulation demonstrations with real-world demonstrations in a unified format.
- Leveraging LeRobot's data visualization and analysis tools on ManiSkill data.
- Sim-to-real transfer workflows where simulation data augments real-world data.
This step is typically performed after trajectory replay/conversion to ensure the trajectory contains the desired observation modalities (especially RGBD images for vision-based policies).
Theoretical Basis
Dataset Standardization
A recurring challenge in robot learning is the lack of standardized data formats across different frameworks, simulators, and real robot setups. Each system typically uses its own data schema, making it difficult to share datasets, reproduce results, or combine data from multiple sources. LeRobot addresses this by defining a common format that accommodates the core data types needed for robot learning: actions, proprioceptive states, camera observations, and metadata.
Key design principles of the LeRobot v3.0 format:
- Chunked storage: Data is split into chunks (default 1000 episodes per chunk) for efficient partial loading and streaming.
- Video-based image storage: Camera observations are stored as MP4 videos rather than raw arrays, achieving significant compression while maintaining visual quality.
- Parquet for tabular data: Actions, states, and metadata use Apache Parquet format, which provides columnar storage with efficient compression and fast read access.
- Self-describing metadata: The
info.jsonfile contains complete feature schemas, dataset statistics, and configuration, making datasets self-documenting. - Per-episode statistics: Each episode has computed min/max/mean/std statistics for all numerical fields, enabling normalization during training.
Cross-Framework Interoperability
Converting between ManiSkill's HDF5 format and LeRobot's directory format involves several representation transformations:
- Episode-keyed HDF5 to flat Parquet: ManiSkill stores each episode as a separate HDF5 group (
traj_0,traj_1, etc.), while LeRobot uses flat Parquet tables with anepisode_indexcolumn. The conversion must flatten the episode structure while preserving episode boundaries. - Observation arrays to video files: ManiSkill stores RGB observations as NumPy arrays within HDF5 groups, while LeRobot stores them as compressed MP4 video files. The conversion involves encoding arrays as video frames with configurable resolution and FPS.
- Metadata mapping: ManiSkill's companion JSON metadata (env_id, env_kwargs, episode seeds, control modes) must be mapped to LeRobot's metadata schema (robot_type, task names, feature schemas, statistics).
- Automatic detection: The converter auto-detects RGB cameras, robot state dimensions, and action dimensions from the input data, and optionally auto-detects task names and robot types from the ManiSkill environment metadata.
Related Pages
- Implementation:Haosulab_ManiSkill_Convert_To_LeRobot_CLI -- The concrete CLI tool for performing the ManiSkill-to-LeRobot format conversion.
- Principle:Haosulab_ManiSkill_Trajectory_Replay_Conversion -- The step that prepares trajectories with the desired observations before export.
- Principle:Haosulab_ManiSkill_Demonstration_Data_Acquisition -- The first step: acquiring raw demonstration data.
- Principle:Haosulab_ManiSkill_Trajectory_Dataset_Loading -- An alternative path: loading data directly into PyTorch for ManiSkill-native training.