Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:ARISE Initiative Robomimic TrainUtils load data for training

From Leeroopedia
Knowledge Sources
Domains Robotics, Data_Pipeline, Offline_Learning
Last Updated 2026-02-15 08:00 GMT

Overview

Concrete tool for loading HDF5 demonstration datasets into SequenceDataset objects for training and validation provided by the robomimic training utilities module.

Description

The load_data_for_training function reads the config to determine dataset paths, filter keys for train/validation splits, and observation modality requirements. It then delegates to dataset_factory to construct SequenceDataset instances for both training and (optionally) validation.

The function enforces that train and validation demo sets are disjoint, and validates that the appropriate filter keys exist in the HDF5 file. It supports multi-dataset configurations where config.train.data is a list of dataset paths.

Usage

Call this function after observation initialization to create datasets for the training loop. The returned datasets are passed to PyTorch DataLoaders for batched training.

Code Reference

Source Location

  • Repository: robomimic
  • File: robomimic/utils/train_utils.py
  • Lines: L94-138

Signature

def load_data_for_training(config, obs_keys):
    """
    Data loading at the start of an algorithm.

    Args:
        config (BaseConfig instance): config object
        obs_keys (list): list of observation modalities that are required for
            training (this will inform the dataloader on what modalities to load)

    Returns:
        train_dataset (SequenceDataset instance): train dataset object
        valid_dataset (SequenceDataset instance): valid dataset object (only if using validation)
    """

Import

import robomimic.utils.train_utils as TrainUtils

# Call as:
train_dataset, valid_dataset = TrainUtils.load_data_for_training(config, obs_keys)

I/O Contract

Inputs

Name Type Required Description
config BaseConfig Yes Full config object; reads config.train.data (list of dataset paths), config.train.hdf5_filter_key, config.train.hdf5_validation_filter_key, config.experiment.validate
obs_keys list Yes List of observation modality keys required for training (e.g., ["robot0_eef_pos", "agentview_image"])

Outputs

Name Type Description
train_dataset SequenceDataset Training dataset with selected demonstrations
valid_dataset SequenceDataset or None Validation dataset (None if config.experiment.validate is False)

Usage Examples

Basic Dataset Loading

import robomimic.utils.train_utils as TrainUtils
from robomimic.config import config_factory
import robomimic.utils.obs_utils as ObsUtils

# 1. Setup config
config = config_factory(algo_name="bc")
ObsUtils.initialize_obs_utils_with_config(config)

# 2. Get observation keys from shape metadata
obs_keys = ["robot0_eef_pos", "robot0_gripper_qpos"]

# 3. Load datasets
train_dataset, valid_dataset = TrainUtils.load_data_for_training(config, obs_keys)

# 4. Create DataLoader
from torch.utils.data import DataLoader
train_loader = DataLoader(
    dataset=train_dataset,
    batch_size=config.train.batch_size,
    shuffle=True,
    num_workers=config.train.num_data_workers,
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment