Implementation:Haosulab ManiSkill ManiSkillTrajectoryDataset
| Field | Value |
|---|---|
| Source Repository | haosulab/ManiSkill |
| Type | API Doc |
| Domains | Imitation_Learning, Robotics, Data_Processing, Machine_Learning |
| Last Updated | 2026-02-15 |
Overview
Description
ManiSkillTrajectoryDataset is a general-purpose PyTorch Dataset class for loading ManiSkill HDF5 trajectory data into memory as observation-action pairs for supervised learning. It reads a .h5 trajectory file and its companion .json metadata file, iterates over episodes, flattens them into a single indexable dataset, and optionally filters to only successful episodes. The class supports placing data on a specified device (CPU or GPU) at load time and handles uint16-to-int32 type conversion for observation data that uses compressed integer formats.
Each item returned by __getitem__ is a dictionary containing the observation, action, termination and truncation flags, and optionally reward, success, and failure indicators for that timestep.
Usage
This class is used after trajectory replay/conversion to prepare data for policy training. It provides a drop-in PyTorch Dataset compatible with DataLoader, BatchSampler, and other standard PyTorch data utilities.
Code Reference
Source Location
| Field | Value |
|---|---|
| Repository | haosulab/ManiSkill |
| File | mani_skill/trajectory/dataset.py
|
| Lines | L23-155 |
| Class | ManiSkillTrajectoryDataset
|
Signature
class ManiSkillTrajectoryDataset(Dataset):
"""
A general torch Dataset you can drop in and use immediately with just about
any trajectory .h5 data generated from ManiSkill.
Args:
dataset_file (str): path to the .h5 file containing the data you want to load
load_count (int): the number of trajectories from the dataset to load into memory.
If -1, will load all into memory
success_only (bool): whether to skip trajectories that are not successful in the end.
Default is False
device: The location to save data to. If None will store as numpy (the default),
otherwise will move data to that device
"""
def __init__(
self,
dataset_file: str,
load_count: int = -1,
success_only: bool = False,
device=None,
) -> None:
Constructor parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
dataset_file |
str | required | Path to the .h5 trajectory file. A companion .json file must exist at the same location.
|
load_count |
int | -1 | Number of trajectories to load. -1 loads all trajectories.
|
success_only |
bool | False | If True, skips episodes that do not have success=True in the episode metadata.
|
device |
torch.device or None | None | Device to place tensors on. If None, data is stored as NumPy arrays. |
Instance attributes after initialization:
| Attribute | Type | Description |
|---|---|---|
obs |
dict of arrays or tensors | Observations indexed by timestep across all loaded episodes. Nested dict for multi-modal obs. |
actions |
ndarray or tensor | Shape (N, action_dim) -- all actions concatenated across episodes.
|
terminated |
ndarray or tensor | Shape (N,) -- per-timestep termination flags.
|
truncated |
ndarray or tensor | Shape (N,) -- per-timestep truncation flags.
|
rewards |
ndarray, tensor, or None | Shape (N,) -- per-timestep rewards (if present in trajectory data).
|
success |
ndarray, tensor, or None | Shape (N,) -- per-timestep success flags (if present).
|
fail |
ndarray, tensor, or None | Shape (N,) -- per-timestep failure flags (if present).
|
env_id |
str | The environment ID from the trajectory metadata. |
env_kwargs |
dict | The environment kwargs from the trajectory metadata. |
Import
from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset
I/O Contract
Inputs:
| Input | Type | Description |
|---|---|---|
dataset_file |
str (file path) | Path to a ManiSkill .h5 trajectory file produced by trajectory replay/conversion. A companion .json metadata file must exist at the same path with .json extension.
|
Outputs (__getitem__ return):
Each call to dataset[idx] returns a dictionary:
{
"obs": obs, # dict of tensors or single tensor, observation at timestep idx
"action": action, # tensor of shape (action_dim,), the expert action
"terminated": bool, # whether the episode terminated at this step
"truncated": bool, # whether the episode was truncated at this step
"reward": float, # (optional) reward at this step
"success": bool, # (optional) success flag at this step
"fail": bool, # (optional) failure flag at this step
}
Dataset length:
len(dataset) returns the total number of observation-action pairs across all loaded episodes (i.e., the sum of all episode lengths).
Usage Examples
Example 1: Basic loading for behavioral cloning
from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset
from torch.utils.data import DataLoader
dataset = ManiSkillTrajectoryDataset(
dataset_file="~/.maniskill/demos/PickCube-v1/trajectory.state.pd_joint_delta_pos.physx_cpu.h5"
)
dataloader = DataLoader(dataset, batch_size=256, shuffle=True)
for batch in dataloader:
obs = batch["obs"]
actions = batch["action"]
# train policy...
Example 2: Load only successful episodes onto GPU
import torch
from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset
dataset = ManiSkillTrajectoryDataset(
dataset_file="trajectory.state.pd_joint_delta_pos.physx_cpu.h5",
load_count=50,
success_only=True,
device=torch.device("cuda"),
)
print(f"Loaded {len(dataset)} timesteps from successful episodes")
print(f"Environment: {dataset.env_id}")
print(f"Action shape: {dataset.actions.shape}")
Example 3: Inspect dataset structure
from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset
dataset = ManiSkillTrajectoryDataset(
dataset_file="trajectory.rgbd.pd_joint_delta_pos.physx_cpu.h5",
load_count=5,
)
sample = dataset[0]
print("Keys:", sample.keys())
print("Obs type:", type(sample["obs"]))
if isinstance(sample["obs"], dict):
for k, v in sample["obs"].items():
print(f" obs[{k}]: shape={v.shape}, dtype={v.dtype}")
print("Action shape:", sample["action"].shape)
Example 4: Use with IterationBasedBatchSampler for fixed iteration training
from mani_skill.trajectory.dataset import ManiSkillTrajectoryDataset
from torch.utils.data import DataLoader
from torch.utils.data.sampler import RandomSampler, BatchSampler
dataset = ManiSkillTrajectoryDataset(
dataset_file="trajectory.state.pd_joint_delta_pos.physx_cpu.h5",
device=torch.device("cuda"),
)
sampler = RandomSampler(dataset)
batch_sampler = BatchSampler(sampler, batch_size=1024, drop_last=True)
dataloader = DataLoader(dataset, batch_sampler=batch_sampler)
Related Pages
- Principle:Haosulab_ManiSkill_Trajectory_Dataset_Loading -- The principle describing trajectory dataset loading theory and design considerations.
- Implementation:Haosulab_ManiSkill_Replay_Trajectory_CLI -- The preceding step: converting trajectories to the desired format.
- Implementation:Haosulab_ManiSkill_BC_Diffusion_Training -- Training scripts that consume this dataset for policy learning.