Implementation:ARISE Initiative Robomimic TrainUtils load data for training
| Knowledge Sources | |
|---|---|
| Domains | Robotics, Data_Pipeline, Offline_Learning |
| Last Updated | 2026-02-15 08:00 GMT |
Overview
Concrete tool for loading HDF5 demonstration datasets into SequenceDataset objects for training and validation provided by the robomimic training utilities module.
Description
The load_data_for_training function reads the config to determine dataset paths, filter keys for train/validation splits, and observation modality requirements. It then delegates to dataset_factory to construct SequenceDataset instances for both training and (optionally) validation.
The function enforces that train and validation demo sets are disjoint, and validates that the appropriate filter keys exist in the HDF5 file. It supports multi-dataset configurations where config.train.data is a list of dataset paths.
Usage
Call this function after observation initialization to create datasets for the training loop. The returned datasets are passed to PyTorch DataLoaders for batched training.
Code Reference
Source Location
- Repository: robomimic
- File: robomimic/utils/train_utils.py
- Lines: L94-138
Signature
def load_data_for_training(config, obs_keys):
"""
Data loading at the start of an algorithm.
Args:
config (BaseConfig instance): config object
obs_keys (list): list of observation modalities that are required for
training (this will inform the dataloader on what modalities to load)
Returns:
train_dataset (SequenceDataset instance): train dataset object
valid_dataset (SequenceDataset instance): valid dataset object (only if using validation)
"""
Import
import robomimic.utils.train_utils as TrainUtils
# Call as:
train_dataset, valid_dataset = TrainUtils.load_data_for_training(config, obs_keys)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | BaseConfig | Yes | Full config object; reads config.train.data (list of dataset paths), config.train.hdf5_filter_key, config.train.hdf5_validation_filter_key, config.experiment.validate |
| obs_keys | list | Yes | List of observation modality keys required for training (e.g., ["robot0_eef_pos", "agentview_image"]) |
Outputs
| Name | Type | Description |
|---|---|---|
| train_dataset | SequenceDataset | Training dataset with selected demonstrations |
| valid_dataset | SequenceDataset or None | Validation dataset (None if config.experiment.validate is False) |
Usage Examples
Basic Dataset Loading
import robomimic.utils.train_utils as TrainUtils
from robomimic.config import config_factory
import robomimic.utils.obs_utils as ObsUtils
# 1. Setup config
config = config_factory(algo_name="bc")
ObsUtils.initialize_obs_utils_with_config(config)
# 2. Get observation keys from shape metadata
obs_keys = ["robot0_eef_pos", "robot0_gripper_qpos"]
# 3. Load datasets
train_dataset, valid_dataset = TrainUtils.load_data_for_training(config, obs_keys)
# 4. Create DataLoader
from torch.utils.data import DataLoader
train_loader = DataLoader(
dataset=train_dataset,
batch_size=config.train.batch_size,
shuffle=True,
num_workers=config.train.num_data_workers,
)