Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:ARISE Initiative Robomimic Train Validation Split

From Leeroopedia
Knowledge Sources
Domains Robotics, Data_Pipeline, Data_Splitting
Last Updated 2026-02-15 08:00 GMT

Overview

A demonstration-level data splitting pattern that partitions robot demonstration datasets into training and validation subsets using in-place HDF5 filter keys without duplicating data.

Description

Train Validation Split creates disjoint training and validation subsets from a demonstration dataset. Unlike splitting individual transitions, this operates at the demonstration level: entire trajectories are assigned to either train or validation. This is critical for robot learning because splitting at the transition level would leak information from the same trajectory into both sets.

The splitting mechanism uses HDF5 filter keys stored in the dataset's mask/ group. A filter key is simply a list of demonstration names (e.g., ["demo_0", "demo_3", "demo_5"]) that defines a subset. This avoids creating duplicate HDF5 files and allows multiple overlapping subsets to coexist in the same file.

Usage

Use this principle after observation extraction and before training. It is a prerequisite for validated training (when config.experiment.validate is True). The resulting filter keys ("train" and "valid") are referenced by config.train.hdf5_filter_key and config.train.hdf5_validation_filter_key.

Theoretical Basis

# Abstract splitting pattern (not real implementation)
demos = ["demo_0", "demo_1", ..., "demo_99"]
num_val = int(0.1 * len(demos))  # 10% for validation

# Random assignment
random.shuffle(indices)
val_demos = demos[:num_val]
train_demos = demos[num_val:]

# Store as filter keys in HDF5 mask/ group
hdf5["mask/train"] = train_demos
hdf5["mask/valid"] = val_demos

The filter key approach supports nested splitting: a subset (e.g., "20_demos") can itself be split into train/valid, producing "20_demos_train" and "20_demos_valid".

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment