Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:ARISE Initiative Robomimic Dataset Loading

From Leeroopedia
Knowledge Sources
Domains Robotics, Data_Pipeline, Offline_Learning
Last Updated 2026-02-15 08:00 GMT

Overview

A dataset construction principle that loads offline robot demonstration data from HDF5 files into memory-efficient PyTorch Dataset objects with support for temporal sequences, filter keys, and train/validation splitting.

Description

Dataset Loading is a critical principle in offline robot learning (learning from demonstrations). Unlike online reinforcement learning where data is collected during training, offline methods require loading pre-collected demonstration trajectories. The challenge is handling large HDF5 datasets containing multi-modal observations (images, proprioception, actions) across potentially hundreds of demonstrations, each with variable length.

This principle addresses several key challenges:

  • Memory management: Datasets can be too large to fit in RAM, requiring on-demand loading with optional caching
  • Temporal structure: Robot learning algorithms often need sequences of consecutive frames (frame stacking, history windows), not individual transitions
  • Flexible subsetting: Filter keys allow selecting subsets of demonstrations (e.g., only training demos, or only the first N demos) without creating separate files
  • Multi-dataset training: Multiple HDF5 files can be combined for joint training across tasks

Usage

Use this principle after configuration setup and observation initialization. It is required before algorithm instantiation because the dataset provides shape metadata (observation dimensions, action space size) needed to construct neural networks.

Theoretical Basis

The dataset loading principle follows an HDF5 filter key pattern for flexible data subsetting:

# Abstract pattern (not real implementation)
# HDF5 file structure:
#   data/
#     demo_0/ (obs, actions, rewards, dones)
#     demo_1/
#     ...
#   mask/
#     train: [demo_0, demo_2, demo_5, ...]     # filter key
#     valid: [demo_1, demo_3, ...]              # filter key
#     20_demos: [demo_0, ..., demo_19]          # filter key

# Loading uses filter keys to select demos
demo_keys = hdf5["mask/train"]  # only training demos
dataset = SequenceDataset(hdf5_path, demo_keys, seq_length=10)

The SequenceDataset internally indexes all valid subsequences across demonstrations, creating a flat index that maps dataset index to (demo_id, start_frame) pairs. This enables standard PyTorch DataLoader batching.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment