Workflow:ARISE Initiative Robomimic Training Policy From Demonstrations

Knowledge Sources	Robomimic Robomimic Docs What Matters in Learning from Offline Human Demonstrations
Domains	Robot_Learning, Imitation_Learning, Offline_RL
Last Updated	2026-02-15 07:30 GMT

Overview

End-to-end process for training a robot manipulation policy from offline demonstration datasets using robomimic's configurable algorithm framework.

Description

This workflow covers the complete training pipeline for learning robot control policies from pre-collected demonstration data stored in HDF5 format. It supports multiple algorithm families including Behavioral Cloning (BC, BC-RNN, BC-Transformer), offline reinforcement learning (BCQ, CQL, IQL, TD3+BC), hierarchical methods (HBC, IRIS), and Diffusion Policy. The pipeline handles configuration management via JSON configs, multi-modal observation processing (low-dimensional state, RGB images, depth, scan), dataset loading with sequence batching and caching, algorithm instantiation through a factory pattern, epoch-based training with periodic validation, environment rollout evaluation, and model checkpointing with best-metric tracking.

Usage

Execute this workflow when you have a prepared HDF5 demonstration dataset (with train/valid filter keys already created) and want to train a policy that can be deployed to control a robot in simulation or the real world. The workflow accepts a JSON configuration file specifying algorithm choice, observation modalities, network architecture, and training hyperparameters. It produces model checkpoint files (.pth) that can be used for evaluation or deployment.

Execution Steps

Step 1: Configuration Setup

Construct the experiment configuration that defines all aspects of the training run. This involves either loading an external JSON config file and merging it with the algorithm's default configuration, or programmatically creating a config using the config factory. The configuration system uses a nested dictionary structure with key-locking to prevent typos. The config specifies the algorithm name, dataset path, observation modalities, network architecture, training hyperparameters, rollout evaluation settings, and checkpoint saving strategy.

Key considerations:

Each algorithm has its own config class (e.g., BCConfig, BCQConfig) registered via metaclass auto-registration
Configs can be generated from experiment templates in the exps/templates directory
The config is locked after setup to prevent accidental modification during training
Debug mode shrinks training to 2 epochs with 3 gradient steps each for quick validation

Step 2: Observation Initialization

Initialize the observation processing framework by registering which observation keys belong to which modalities (low_dim, rgb, depth, scan). This step configures the global observation utilities that govern how raw observations are preprocessed, normalized, and fed into the neural networks. It parses the config to determine image dimensions, normalization settings, and encoder configurations.

Key considerations:

Observation modalities are registered globally and affect all downstream model construction
RGB observations undergo HWC-to-CHW conversion and [0,255]-to-[0,1] normalization
Depth observations can be processed alongside RGB observations
The observation encoder architecture (backbone, pooling, randomizer) is specified per modality

Step 3: Dataset Loading

Load the HDF5 demonstration dataset into a PyTorch-compatible SequenceDataset (or MetaDataset for multi-dataset training). The dataset extracts observation sequences of configurable length for RNN/Transformer training, supports frame stacking, and implements multiple caching strategies. Filter keys in the HDF5 mask group control which demonstrations belong to the training versus validation split. A DataLoader wraps the dataset for batched, optionally shuffled iteration.

Key considerations:

Cache mode "all" loads entire HDF5 into memory (fastest); "low_dim" caches only non-image data; None uses file I/O per sample
Sequence length must match the RNN/Transformer horizon setting
Multi-dataset training uses MetaDataset with weighted sampling across multiple HDF5 files
Normalization statistics can be computed over the training set for input standardization

Step 4: Algorithm Instantiation

Create the learning algorithm instance via the factory pattern. The algo_factory function maps the algorithm name to the registered algorithm class, initializes the neural network architecture based on observation shapes and action dimensions, sets up optimizers and learning rate schedulers, and moves the model to the target device (CPU/GPU). The algorithm encapsulates all network components, loss computation, and training logic.

Key considerations:

Nine algorithm implementations are available: BC, BCQ, CQL, IQL, TD3+BC, GL, HBC, IRIS, Diffusion Policy
Hierarchical algorithms (HBC, IRIS) compose multiple sub-algorithms (planner + actor)
The MIMO (Multiple-Input Multiple-Output) pattern processes multi-modal observations through modality-specific encoders before feeding into backbone networks
Pretrained checkpoints can be loaded to warm-start training

Step 5: Training Loop Execution

Run the main epoch-based training loop. Each epoch consists of iterating through batches from the DataLoader, processing observations, computing forward passes and losses, performing gradient updates, and logging metrics. After the training epoch, an optional validation epoch evaluates the model on held-out data without gradient updates. Metrics are logged to TensorBoard and optionally Weights & Biases.

Key considerations:

Epoch length can be fixed to N gradient steps or span the full dataset
The model's process_batch_for_training and train_on_batch methods handle algorithm-specific logic
Observation normalization statistics are applied during batch postprocessing
Memory usage is tracked and logged per epoch

Step 6: Rollout Evaluation

Periodically evaluate the trained policy by running rollouts in the simulation environment. The model is wrapped in a RolloutPolicy that handles observation normalization and action denormalization at inference time. Multiple episodes are rolled out per environment, computing success rate, return, and horizon statistics. Evaluation videos can be rendered and saved.

Key considerations:

Rollout frequency is controlled by config (e.g., every N epochs) with an optional warmstart delay
Multiple environments can be evaluated simultaneously (e.g., from multi-dataset training)
The RolloutPolicy wrapper manages RNN hidden state across timesteps during inference
Rollouts support goal-conditioned policies via env.get_goal()

Step 7: Checkpointing and Model Saving

Save model checkpoints based on configurable triggers: periodic time/epoch intervals, best validation loss, best rollout return, or best rollout success rate. Each checkpoint bundles the serialized model weights, optimizer state, config, environment metadata, shape metadata, observation/action normalization statistics, and training variable state for seamless resume functionality. A latest checkpoint with backup is always maintained for crash recovery.

Key considerations:

Resume functionality loads the latest checkpoint and continues training from the saved epoch
Checkpoint names encode the saving reason (e.g., best_success_rate, model_epoch_100)
The output directory structure is: output_dir/experiment_name/timestamp/{logs,models,videos}
Both model weights and optimizer states are serialized for exact training continuation

Execution Diagram

GitHub URL

Workflow Repository