Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Haosulab ManiSkill ManiSkillTrajectoryDataset

From Leeroopedia
Field Value
Source Repository haosulab/ManiSkill
Type API Doc
Domains Imitation_Learning, Robotics, Data_Processing, Machine_Learning
Last Updated 2026-02-15

Overview

Description

ManiSkillTrajectoryDataset is a general-purpose PyTorch Dataset class for loading ManiSkill HDF5 trajectory data into memory as observation-action pairs for supervised learning. It reads a .h5 trajectory file and its companion .json metadata file, iterates over episodes, flattens them into a single indexable dataset, and optionally filters to only successful episodes. The class supports placing data on a specified device (CPU or GPU) at load time and handles uint16-to-int32 type conversion for observation data that uses compressed integer formats.

Each item returned by __getitem__ is a dictionary containing the observation, action, termination and truncation flags, and optionally reward, success, and failure indicators for that timestep.

Usage

This class is used after trajectory replay/conversion to prepare data for policy training. It provides a drop-in PyTorch Dataset compatible with DataLoader, BatchSampler, and other standard PyTorch data utilities.

Code Reference

Source Location

Field Value
Repository haosulab/ManiSkill
File mani_skill/trajectory/dataset.py
Lines L23-155
Class ManiSkillTrajectoryDataset

Signature

class ManiSkillTrajectoryDataset(Dataset):
    """
    A general torch Dataset you can drop in and use immediately with just about
    any trajectory .h5 data generated from ManiSkill.

    Args:
        dataset_file (str): path to the .h5 file containing the data you want to load
        load_count (int): the number of trajectories from the dataset to load into memory.
            If -1, will load all into memory
        success_only (bool): whether to skip trajectories that are not successful in the end.
            Default is False
        device: The location to save data to. If None will store as numpy (the default),
            otherwise will move data to that device
    """

    def __init__(
        self,
        dataset_file: str,
        load_count: int = -1,
        success_only: bool = False,
        device=None,
    ) -> None:

Constructor parameters:

Parameter Type Default Description
dataset_file str required Path to the .h5 trajectory file. A companion .json file must exist at the same location.
load_count int -1 Number of trajectories to load. -1 loads all trajectories.
success_only bool False If True, skips episodes that do not have success=True in the episode metadata.
device torch.device or None None Device to place tensors on. If None, data is stored as NumPy arrays.

Instance attributes after initialization:

Attribute Type Description
obs dict of arrays or tensors Observations indexed by timestep across all loaded episodes. Nested dict for multi-modal obs.
actions ndarray or tensor Shape (N, action_dim) -- all actions concatenated across episodes.
terminated ndarray or tensor Shape (N,) -- per-timestep termination flags.
truncated ndarray or tensor Shape (N,) -- per-timestep truncation flags.
rewards ndarray, tensor, or None Shape (N,) -- per-timestep rewards (if present in trajectory data).
success ndarray, tensor, or None Shape (N,) -- per-timestep success flags (if present).
fail ndarray, tensor, or None Shape (N,) -- per-timestep failure flags (if present).
env_id str The environment ID from the trajectory metadata.
env_kwargs dict The environment kwargs from the trajectory metadata.

Import

from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset

I/O Contract

Inputs:

Input Type Description
dataset_file str (file path) Path to a ManiSkill .h5 trajectory file produced by trajectory replay/conversion. A companion .json metadata file must exist at the same path with .json extension.

Outputs (__getitem__ return):

Each call to dataset[idx] returns a dictionary:

{
    "obs": obs,          # dict of tensors or single tensor, observation at timestep idx
    "action": action,    # tensor of shape (action_dim,), the expert action
    "terminated": bool,  # whether the episode terminated at this step
    "truncated": bool,   # whether the episode was truncated at this step
    "reward": float,     # (optional) reward at this step
    "success": bool,     # (optional) success flag at this step
    "fail": bool,        # (optional) failure flag at this step
}

Dataset length:

len(dataset) returns the total number of observation-action pairs across all loaded episodes (i.e., the sum of all episode lengths).

Usage Examples

Example 1: Basic loading for behavioral cloning

from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset
from torch.utils.data import DataLoader

dataset = ManiSkillTrajectoryDataset(
    dataset_file="~/.maniskill/demos/PickCube-v1/trajectory.state.pd_joint_delta_pos.physx_cpu.h5"
)

dataloader = DataLoader(dataset, batch_size=256, shuffle=True)
for batch in dataloader:
    obs = batch["obs"]
    actions = batch["action"]
    # train policy...

Example 2: Load only successful episodes onto GPU

import torch
from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset

dataset = ManiSkillTrajectoryDataset(
    dataset_file="trajectory.state.pd_joint_delta_pos.physx_cpu.h5",
    load_count=50,
    success_only=True,
    device=torch.device("cuda"),
)
print(f"Loaded {len(dataset)} timesteps from successful episodes")
print(f"Environment: {dataset.env_id}")
print(f"Action shape: {dataset.actions.shape}")

Example 3: Inspect dataset structure

from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset

dataset = ManiSkillTrajectoryDataset(
    dataset_file="trajectory.rgbd.pd_joint_delta_pos.physx_cpu.h5",
    load_count=5,
)

sample = dataset[0]
print("Keys:", sample.keys())
print("Obs type:", type(sample["obs"]))
if isinstance(sample["obs"], dict):
    for k, v in sample["obs"].items():
        print(f"  obs[{k}]: shape={v.shape}, dtype={v.dtype}")
print("Action shape:", sample["action"].shape)

Example 4: Use with IterationBasedBatchSampler for fixed iteration training

from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset
from torch.utils.data import DataLoader
from torch.utils.data.sampler import RandomSampler, BatchSampler

dataset = ManiSkillTrajectoryDataset(
    dataset_file="trajectory.state.pd_joint_delta_pos.physx_cpu.h5",
    device=torch.device("cuda"),
)

sampler = RandomSampler(dataset)
batch_sampler = BatchSampler(sampler, batch_size=1024, drop_last=True)
dataloader = DataLoader(dataset, batch_sampler=batch_sampler)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment