Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:ARISE Initiative Robosuite Gather Demonstrations As HDF5

From Leeroopedia

Metadata:

Overview

Concrete function for aggregating raw demonstration directories into a single HDF5 file provided by the robosuite collection scripts.

Description

The gather_demonstrations_as_hdf5() function reads raw episode directories (containing state_*.npz files and model.xml), and writes a single demo.hdf5 file. Each demonstration is stored as data/demo_N/ with states and actions datasets plus model_file attribute.

The function performs the following operations:

  • Scans the input directory for demonstration subdirectories
  • Reads per-timestep state and action data from .npz files
  • Aggregates timesteps into demonstration-level arrays
  • Creates an HDF5 file with hierarchical group structure
  • Stores states and actions as datasets within each demo group
  • Embeds model.xml content as an attribute for environment reproducibility
  • Adds metadata attributes including collection date, time, and environment configuration

Usage

Called after all demonstration episodes are collected to create the final dataset file. This function is typically invoked as the final step in the demonstration collection pipeline, after human teleoperation or scripted data collection has completed.

Code Reference

Source: robosuite

File: robosuite/scripts/collect_human_demonstrations.py

Lines: L120-207

Signature:

def gather_demonstrations_as_hdf5(directory, out_dir, env_info):
    """
    Gathers demonstrations saved in @directory into a single hdf5 file.

    Args:
        directory (str): Path to raw demonstration directories
        out_dir (str): Path to store the hdf5 file
        env_info (str): JSON-encoded environment information string
    """

Import:

from robosuite.scripts.collect_human_demonstrations import gather_demonstrations_as_hdf5

I/O Contract

Inputs

Parameter Type Required Description
directory str Yes Path to raw demonstration directories containing state_*.npz files and model.xml
out_dir str Yes Path to directory where the output HDF5 file will be stored
env_info str Yes JSON-encoded string containing environment configuration (robot, controller, task parameters)

Outputs

File: demo.hdf5 with the following structure:

demo.hdf5
├── data/
│   ├── demo_0/
│   │   ├── states (dataset: float array, shape [N, D])
│   │   ├── actions (dataset: float array, shape [N, A])
│   │   └── model_file (attribute: string, XML content)
│   ├── demo_1/
│   │   └── ...
│   └── demo_K/
│       └── ...
└── (root attributes)
    ├── date (string: collection date)
    ├── time (string: collection time)
    ├── repository_version (string: git commit hash)
    └── env (string: JSON environment configuration)

Dataset Shapes:

  • states: [num_timesteps, state_dim] - Flattened MuJoCo simulator states
  • actions: [num_timesteps, action_dim] - Robot control actions

Usage Examples

Example 1: Basic Aggregation

import json
from robosuite.scripts.collect_human_demonstrations import gather_demonstrations_as_hdf5

# Define paths
raw_demo_dir = "/tmp/raw_demonstrations"
output_dir = "/tmp/datasets"

# Environment configuration
env_config = {
    "env_name": "Lift",
    "robots": "Panda",
    "controller": "OSC_POSE",
    "horizon": 500
}

# Convert to JSON string
env_info_json = json.dumps(env_config)

# Aggregate demonstrations
gather_demonstrations_as_hdf5(
    directory=raw_demo_dir,
    out_dir=output_dir,
    env_info=env_info_json
)

print(f"Dataset created at {output_dir}/demo.hdf5")

Example 2: Reading Back the HDF5 File

import h5py
import numpy as np

# Open the aggregated dataset
with h5py.File("/tmp/datasets/demo.hdf5", "r") as f:
    # Read metadata
    print("Collection date:", f.attrs["date"])
    print("Repository version:", f.attrs["repository_version"])
    print("Environment config:", f.attrs["env"])

    # Access first demonstration
    demo_0 = f["data/demo_0"]

    # Load states and actions
    states = demo_0["states"][:]  # Shape: [N, state_dim]
    actions = demo_0["actions"][:]  # Shape: [N, action_dim]

    # Read environment model XML
    model_xml = demo_0.attrs["model_file"]

    print(f"Demo 0: {len(states)} timesteps")
    print(f"State dimension: {states.shape[1]}")
    print(f"Action dimension: {actions.shape[1]}")

    # Iterate through all demonstrations
    num_demos = len([k for k in f["data"].keys() if k.startswith("demo")])
    print(f"Total demonstrations: {num_demos}")

    for i in range(num_demos):
        demo = f[f"data/demo_{i}"]
        print(f"Demo {i}: {len(demo['states'])} timesteps")

Example 3: Integration with Data Collection Pipeline

import os
import json
import robosuite as suite
from robosuite.scripts.collect_human_demonstrations import gather_demonstrations_as_hdf5

# Step 1: Collect demonstrations (simplified example)
env = suite.make(
    "Lift",
    robots="Panda",
    has_renderer=True,
    has_offscreen_renderer=False,
    use_camera_obs=False,
)

# ... collect demonstrations and save to raw directory ...
# (demonstration collection code omitted for brevity)

# Step 2: Aggregate after collection completes
raw_dir = "/tmp/demos/raw"
output_dir = "/tmp/demos/processed"

# Gather environment metadata
env_info = {
    "env_name": "Lift",
    "type": 1,  # Environment type
    "env_kwargs": {
        "robots": "Panda",
        "controller_configs": {"type": "OSC_POSE"}
    }
}

# Aggregate into HDF5
gather_demonstrations_as_hdf5(
    directory=raw_dir,
    out_dir=output_dir,
    env_info=json.dumps(env_info)
)

print(f"Dataset ready for training at {output_dir}/demo.hdf5")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment