Principle:Haosulab ManiSkill Trajectory Merging
| Field | Value |
|---|---|
| Principle Name | Trajectory Merging |
| Domain | Motion_Planning |
| Overview | Merging multiple trajectory files into a single consolidated dataset |
| Date | 2026-02-15 |
| Repository | Haosulab/ManiSkill |
Overview
The Trajectory Merging principle describes how ManiSkill combines multiple HDF5 trajectory files (each potentially produced by a different process or recording session) into a single unified dataset file. This is the final step of the parallel trajectory generation pipeline and is also useful for combining datasets from different sources, seeds, or experiments.
Description
When trajectories are generated in parallel (see Principle:Haosulab_ManiSkill_Parallel_Trajectory_Generation), each process writes to its own HDF5 file with its own episode ID sequence. The merge step must:
- Consolidate HDF5 groups: Copy all
traj_Ngroups from each input file into a single output HDF5 file.
- Renumber episode IDs: By default (
recompute_id=True), episode IDs are renumbered consecutively starting from 0 to ensure a contiguous, conflict-free ID space. This is essential because multiple input files may each start their IDs at 0.
- Merge JSON metadata: The companion JSON files contain per-episode metadata (seeds, control modes, success flags) and global metadata (environment info, commit info, source descriptions). The merge preserves the first file's global metadata and logs warnings if there are conflicts between files.
- Conflict detection: If
recompute_id=False, the merge asserts that no two input files share the same episode ID, preventing silent data overwrites.
Usage
Trajectory merging is invoked automatically at the end of parallel trajectory generation. It can also be used as a standalone tool:
python -m mani_skill.trajectory.merge_trajectory \
-i demos/PickCube-v1/run1 demos/PickCube-v1/run2 \
-o demos/PickCube-v1/merged/trajectory.h5 \
-p "*.h5"
Or programmatically:
from mani_skill.trajectory.merge_trajectory import merge_trajectories
merge_trajectories(
output_path="demos/merged.h5",
traj_paths=["demos/batch.0.h5", "demos/batch.1.h5", "demos/batch.2.h5"],
recompute_id=True,
)
After merging, the output directory contains a single .h5 file and its companion .json file, ready for downstream consumption by replay or training scripts.
Theoretical Basis
- Data normalization: Renumbering episode IDs is analogous to re-indexing rows in a database merge, ensuring referential integrity between the HDF5 data and the JSON episode metadata.
- Idempotent merging: The merge operation is designed to be safe to re-run: the output file is created fresh (write mode), so repeated merges of the same inputs produce identical outputs.
- HDF5 group copying: The
h5py.File.copy()method performs an efficient deep copy of HDF5 groups including all datasets, attributes, and compression settings, preserving the original data fidelity.
- Provenance preservation: By retaining the global metadata (environment kwargs, commit info, source type) and logging conflicts, the merge maintains traceability from the merged dataset back to the original generation parameters.