Implementation:Haosulab ManiSkill ManiSkillTrajectoryDataset

Field	Value
Source Repository	haosulab/ManiSkill
Type	API Doc
Domains	Imitation_Learning, Robotics, Data_Processing, Machine_Learning
Last Updated	2026-02-15

Overview

Description

ManiSkillTrajectoryDataset is a general-purpose PyTorch Dataset class for loading ManiSkill HDF5 trajectory data into memory as observation-action pairs for supervised learning. It reads a .h5 trajectory file and its companion .json metadata file, iterates over episodes, flattens them into a single indexable dataset, and optionally filters to only successful episodes. The class supports placing data on a specified device (CPU or GPU) at load time and handles uint16-to-int32 type conversion for observation data that uses compressed integer formats.

Each item returned by __getitem__ is a dictionary containing the observation, action, termination and truncation flags, and optionally reward, success, and failure indicators for that timestep.

Usage

This class is used after trajectory replay/conversion to prepare data for policy training. It provides a drop-in PyTorch Dataset compatible with DataLoader, BatchSampler, and other standard PyTorch data utilities.

Code Reference

Source Location

Field	Value
Repository	haosulab/ManiSkill
File	`mani_skill/trajectory/dataset.py`
Lines	L23-155
Class	`ManiSkillTrajectoryDataset`

Signature

class ManiSkillTrajectoryDataset(Dataset):
    """
    A general torch Dataset you can drop in and use immediately with just about
    any trajectory .h5 data generated from ManiSkill.

    Args:
        dataset_file (str): path to the .h5 file containing the data you want to load
        load_count (int): the number of trajectories from the dataset to load into memory.
            If -1, will load all into memory
        success_only (bool): whether to skip trajectories that are not successful in the end.
            Default is False
        device: The location to save data to. If None will store as numpy (the default),
            otherwise will move data to that device
    """

    def __init__(
        self,
        dataset_file: str,
        load_count: int = -1,
        success_only: bool = False,
        device=None,
    ) -> None:

Constructor parameters:

Parameter	Type	Default	Description
`dataset_file`	str	required	Path to the `.h5` trajectory file. A companion `.json` file must exist at the same location.
`load_count`	int	-1	Number of trajectories to load. `-1` loads all trajectories.
`success_only`	bool	False	If True, skips episodes that do not have `success=True` in the episode metadata.
`device`	torch.device or None	None	Device to place tensors on. If None, data is stored as NumPy arrays.

Instance attributes after initialization:

Attribute	Type	Description
`obs`	dict of arrays or tensors	Observations indexed by timestep across all loaded episodes. Nested dict for multi-modal obs.
`actions`	ndarray or tensor	Shape `(N, action_dim)` -- all actions concatenated across episodes.
`terminated`	ndarray or tensor	Shape `(N,)` -- per-timestep termination flags.
`truncated`	ndarray or tensor	Shape `(N,)` -- per-timestep truncation flags.
`rewards`	ndarray, tensor, or None	Shape `(N,)` -- per-timestep rewards (if present in trajectory data).
`success`	ndarray, tensor, or None	Shape `(N,)` -- per-timestep success flags (if present).
`fail`	ndarray, tensor, or None	Shape `(N,)` -- per-timestep failure flags (if present).
`env_id`	str	The environment ID from the trajectory metadata.
`env_kwargs`	dict	The environment kwargs from the trajectory metadata.

Import

from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset

I/O Contract

Inputs:

Input	Type	Description
`dataset_file`	str (file path)	Path to a ManiSkill `.h5` trajectory file produced by trajectory replay/conversion. A companion `.json` metadata file must exist at the same path with `.json` extension.

Outputs (__getitem__ return):

Each call to dataset[idx] returns a dictionary:

{
    "obs": obs,          # dict of tensors or single tensor, observation at timestep idx
    "action": action,    # tensor of shape (action_dim,), the expert action
    "terminated": bool,  # whether the episode terminated at this step
    "truncated": bool,   # whether the episode was truncated at this step
    "reward": float,     # (optional) reward at this step
    "success": bool,     # (optional) success flag at this step
    "fail": bool,        # (optional) failure flag at this step
}

Dataset length:

len(dataset) returns the total number of observation-action pairs across all loaded episodes (i.e., the sum of all episode lengths).

Usage Examples

Example 1: Basic loading for behavioral cloning

from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset
from torch.utils.data import DataLoader

dataset = ManiSkillTrajectoryDataset(
    dataset_file="~/.maniskill/demos/PickCube-v1/trajectory.state.pd_joint_delta_pos.physx_cpu.h5"
)

dataloader = DataLoader(dataset, batch_size=256, shuffle=True)
for batch in dataloader:
    obs = batch["obs"]
    actions = batch["action"]
    # train policy...

Example 2: Load only successful episodes onto GPU

import torch
from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset

dataset = ManiSkillTrajectoryDataset(
    dataset_file="trajectory.state.pd_joint_delta_pos.physx_cpu.h5",
    load_count=50,
    success_only=True,
    device=torch.device("cuda"),
)
print(f"Loaded {len(dataset)} timesteps from successful episodes")
print(f"Environment: {dataset.env_id}")
print(f"Action shape: {dataset.actions.shape}")

Example 3: Inspect dataset structure

from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset

dataset = ManiSkillTrajectoryDataset(
    dataset_file="trajectory.rgbd.pd_joint_delta_pos.physx_cpu.h5",
    load_count=5,
)

sample = dataset[0]
print("Keys:", sample.keys())
print("Obs type:", type(sample["obs"]))
if isinstance(sample["obs"], dict):
    for k, v in sample["obs"].items():
        print(f"  obs[{k}]: shape={v.shape}, dtype={v.dtype}")
print("Action shape:", sample["action"].shape)

Example 4: Use with IterationBasedBatchSampler for fixed iteration training

from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset
from torch.utils.data import DataLoader
from torch.utils.data.sampler import RandomSampler, BatchSampler

dataset = ManiSkillTrajectoryDataset(
    dataset_file="trajectory.state.pd_joint_delta_pos.physx_cpu.h5",
    device=torch.device("cuda"),
)

sampler = RandomSampler(dataset)
batch_sampler = BatchSampler(sampler, batch_size=1024, drop_last=True)
dataloader = DataLoader(dataset, batch_sampler=batch_sampler)

Related Pages

Principle:Haosulab_ManiSkill_Trajectory_Dataset_Loading -- The principle describing trajectory dataset loading theory and design considerations.
Implementation:Haosulab_ManiSkill_Replay_Trajectory_CLI -- The preceding step: converting trajectories to the desired format.
Implementation:Haosulab_ManiSkill_BC_Diffusion_Training -- Training scripts that consume this dataset for policy learning.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment