Principle:ARISE Initiative Robomimic Training Loop Execution
| Knowledge Sources | |
|---|---|
| Domains | Robotics, Training, Optimization |
| Last Updated | 2026-02-15 08:00 GMT |
Overview
An epoch-based training loop pattern that iterates over batched demonstration data, performing forward and backward passes through an algorithm's networks while collecting training metrics.
Description
Training Loop Execution is the core optimization step in offline robot learning. It follows the standard supervised learning paradigm: iterate over mini-batches from a DataLoader, compute loss, and update model parameters. However, it abstracts the specific loss computation and parameter update logic into the algorithm's train_on_batch method, allowing the same loop to work with diverse algorithm families.
The training loop handles:
- Batch processing: Loads batches from the DataLoader, handles epoch boundaries with iterator reset
- Observation normalization: Optionally applies observation normalization statistics before training
- Algorithm-agnostic interface: Calls model.process_batch_for_training, model.train_on_batch, and model.log_info without knowing algorithm specifics
- Validation mode: The same loop supports validation by disabling gradient computation
- Fixed-step epochs: Optionally limits the number of gradient steps per epoch (useful when datasets are very large)
- Timing statistics: Tracks time spent in data loading, batch processing, training, and logging
Usage
Use this principle during the inner loop of training, called once per epoch (for both training and validation). It requires a fully instantiated algorithm and a DataLoader wrapping a SequenceDataset.
Theoretical Basis
The training loop implements the standard mini-batch stochastic gradient descent pattern adapted for offline learning:
# Abstract training loop (not real implementation)
for step in range(num_steps):
batch = next(data_loader_iter)
# Algorithm-specific batch preprocessing
input_batch = model.process_batch_for_training(batch)
input_batch = model.postprocess_batch_for_training(input_batch, obs_normalization_stats)
# Forward + backward + optimizer step (all encapsulated)
info = model.train_on_batch(input_batch, epoch, validate=False)
# Learning rate scheduling
model.on_gradient_step()
# Collect metrics
step_log = model.log_info(info)
all_logs.append(step_log)
# Return averaged metrics
return {k: mean(v) for k, v in aggregate(all_logs)}
The key design decision is encapsulating the loss function and optimizer step inside train_on_batch, making the outer loop algorithm-agnostic.