Principle:Haosulab ManiSkill Trajectory Merging

Field	Value
Principle Name	Trajectory Merging
Domain	Motion_Planning
Overview	Merging multiple trajectory files into a single consolidated dataset
Date	2026-02-15
Repository	Haosulab/ManiSkill

Overview

The Trajectory Merging principle describes how ManiSkill combines multiple HDF5 trajectory files (each potentially produced by a different process or recording session) into a single unified dataset file. This is the final step of the parallel trajectory generation pipeline and is also useful for combining datasets from different sources, seeds, or experiments.

Description

When trajectories are generated in parallel (see Principle:Haosulab_ManiSkill_Parallel_Trajectory_Generation), each process writes to its own HDF5 file with its own episode ID sequence. The merge step must:

Consolidate HDF5 groups: Copy all traj_N groups from each input file into a single output HDF5 file.

Renumber episode IDs: By default (recompute_id=True), episode IDs are renumbered consecutively starting from 0 to ensure a contiguous, conflict-free ID space. This is essential because multiple input files may each start their IDs at 0.

Merge JSON metadata: The companion JSON files contain per-episode metadata (seeds, control modes, success flags) and global metadata (environment info, commit info, source descriptions). The merge preserves the first file's global metadata and logs warnings if there are conflicts between files.

Conflict detection: If recompute_id=False, the merge asserts that no two input files share the same episode ID, preventing silent data overwrites.

Usage

Trajectory merging is invoked automatically at the end of parallel trajectory generation. It can also be used as a standalone tool:

python -m mani_skill.trajectory.merge_trajectory \
    -i demos/PickCube-v1/run1 demos/PickCube-v1/run2 \
    -o demos/PickCube-v1/merged/trajectory.h5 \
    -p "*.h5"

Or programmatically:

from mani_skill.trajectory.merge_trajectory import merge_trajectories

merge_trajectories(
    output_path="demos/merged.h5",
    traj_paths=["demos/batch.0.h5", "demos/batch.1.h5", "demos/batch.2.h5"],
    recompute_id=True,
)

After merging, the output directory contains a single .h5 file and its companion .json file, ready for downstream consumption by replay or training scripts.

Theoretical Basis

Data normalization: Renumbering episode IDs is analogous to re-indexing rows in a database merge, ensuring referential integrity between the HDF5 data and the JSON episode metadata.

Idempotent merging: The merge operation is designed to be safe to re-run: the output file is created fresh (write mode), so repeated merges of the same inputs produce identical outputs.

HDF5 group copying: The h5py.File.copy() method performs an efficient deep copy of HDF5 groups including all datasets, attributes, and compression settings, preserving the original data fidelity.

Provenance preservation: By retaining the global metadata (environment kwargs, commit info, source type) and logging conflicts, the merge maintains traceability from the merged dataset back to the original generation parameters.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment