Implementation:Haosulab ManiSkill Merge Trajectories Func

Field	Value
Implementation Name	Merge Trajectories Func
Type	API Doc
Domain	Motion_Planning
Source File	`mani_skill/trajectory/merge_trajectory.py` (L9-75)
Date	2026-02-15
Repository	Haosulab/ManiSkill

Overview

The merge_trajectories function combines multiple HDF5 trajectory files and their companion JSON metadata files into a single consolidated dataset. This is the final step of the parallel trajectory generation pipeline and can also be used as a standalone tool for combining datasets from different recording sessions.

Description

The function iterates through each input trajectory file, copies all HDF5 episode groups to the output file, and merges the JSON episode metadata. By default, it renumbers episode IDs consecutively to produce a contiguous ID space. It also preserves global metadata (environment info, commit info, source descriptions) from the first input file, logging warnings if subsequent files have conflicting values.

Usage

from mani_skill.trajectory.merge_trajectory import merge_trajectories

merge_trajectories(
    output_path="demos/merged.h5",
    traj_paths=["demos/batch.0.h5", "demos/batch.1.h5"],
    recompute_id=True,
)

Code Reference

Function Signature

def merge_trajectories(
    output_path: str,
    traj_paths: list,
    recompute_id: bool = True,
) -> None:

Parameters

Parameter	Type	Default	Description
`output_path`	`str`	(required)	Path for the output HDF5 file. The JSON file is saved at the same path with `.json` extension.
`traj_paths`	`list`	(required)	List of paths to input HDF5 trajectory files. Each must have a companion `.json` file.
`recompute_id`	`bool`	`True`	If True, renumber episode IDs consecutively starting from 0. If False, keep original IDs (asserts no conflicts).

Implementation (L9-75)

def merge_trajectories(output_path: str, traj_paths: list, recompute_id: bool = True):
    logger.info(f"Merging {output_path}")
    merged_h5_file = h5py.File(output_path, "w")
    merged_json_path = output_path.replace(".h5", ".json")
    merged_json_data = {"episodes": []}
    cnt = 0

    for traj_path in traj_paths:
        traj_path = str(traj_path)
        logger.info(f"Merging{traj_path}")

        with h5py.File(traj_path, "r") as h5_file:
            json_data = load_json(traj_path.replace(".h5", ".json"))

            # For keys other than episodes, keep the first data
            for key, value in json_data.items():
                if key == "episodes":
                    continue
                if key not in merged_json_data:
                    merged_json_data[key] = value
                else:
                    if merged_json_data[key] != value:
                        logger.warning(
                            f"Conflict detected for key {key} in {traj_path}"
                        )

            # Merge episodes
            for ep in json_data["episodes"]:
                episode_id = ep["episode_id"]
                traj_id = f"traj_{episode_id}"

                if recompute_id:
                    new_traj_id = f"traj_{cnt}"
                else:
                    new_traj_id = traj_id

                assert new_traj_id not in merged_h5_file, new_traj_id
                h5_file.copy(traj_id, merged_h5_file, new_traj_id)

                if recompute_id:
                    ep["episode_id"] = cnt
                merged_json_data["episodes"].append(ep)
                cnt += 1

    merged_h5_file.close()
    dump_json(merged_json_path, merged_json_data, indent=2)

CLI Interface (L78-97)

# Command-line usage:
# python -m mani_skill.trajectory.merge_trajectory \
#     -i dir1 dir2 -o output/merged.h5 -p "*.h5"

Argument	Description
`-i` / `--input-dirs`	Input directories to search for trajectory files.
`-o` / `--output-path`	Path for the merged output HDF5 file.
`-p` / `--pattern`	Glob pattern to match trajectory files (default: `trajectory.h5`).

I/O Contract

Direction	Data	Format
Input	List of HDF5 trajectory files	Each containing `traj_0`, `traj_1`, ... groups
Input	Companion JSON files	Same basename as HDF5 with `.json` extension
Output	Merged HDF5 file	Single file with consecutively numbered `traj_0`, `traj_1`, ... groups
Output	Merged JSON file	Combined episode metadata with renumbered IDs

Merge Behavior

JSON Key	Merge Strategy
`episodes`	Concatenated from all input files; IDs renumbered if `recompute_id=True`
`env_info`	Kept from the first file; warnings logged for conflicts
`commit_info`	Kept from the first file; warnings logged for conflicts
`source_type`	Kept from the first file; warnings logged for conflicts
`source_desc`	Kept from the first file; warnings logged for conflicts

Usage Examples

# Merge trajectory files from a parallel generation run
from mani_skill.trajectory.merge_trajectory import merge_trajectories

traj_files = [
    "demos/PickCube-v1/motionplanning/20260215.0.h5",
    "demos/PickCube-v1/motionplanning/20260215.1.h5",
    "demos/PickCube-v1/motionplanning/20260215.2.h5",
    "demos/PickCube-v1/motionplanning/20260215.3.h5",
]
merge_trajectories(
    output_path="demos/PickCube-v1/motionplanning/20260215.h5",
    traj_paths=traj_files,
    recompute_id=True,
)
# Result: single 20260215.h5 with traj_0 through traj_N

# CLI usage to merge all trajectory files from multiple directories
python -m mani_skill.trajectory.merge_trajectory \
    -i demos/run1 demos/run2 demos/run3 \
    -o demos/all_merged/trajectory.h5 \
    -p "*.h5"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment