Heuristic:ARISE Initiative Robomimic Rollout Horizon Selection
| Knowledge Sources | |
|---|---|
| Domains | Robot_Learning, Optimization |
| Last Updated | 2026-02-15 07:30 GMT |
Overview
Select rollout evaluation horizon based on task complexity: 400 steps for standard tasks (lift, can, square), 700 for complex tasks (transport, tool hang), and 1000 for offline RL algorithms.
Description
The rollout horizon determines the maximum number of environment steps allowed during evaluation. Setting it too low causes premature termination before the policy can complete the task; setting it too high wastes compute on failed episodes. Robomimic's dataset registry encodes task-specific horizons based on the developers' empirical experience with each benchmark task. Additionally, some algorithms (like TD3+BC) require longer horizons because their policies tend to be slower and more exploratory. The framework supports early termination on success (`terminate_on_success=True`) to avoid wasting steps on already-completed episodes.
Usage
Apply this heuristic when setting `config.experiment.rollout.horizon` for training or evaluation. If using a task from the DATASET_REGISTRY, the correct horizon is automatically populated. For custom tasks, start with 400 and increase if policies consistently fail near the horizon boundary.
The Insight (Rule of Thumb)
- Action: Set `config.experiment.rollout.horizon` based on task and algorithm.
- Value:
- 400 steps — Standard tasks: lift, can, square (proficient human, multi human)
- 500 steps — Multi-human datasets for lift, can, square, transport
- 700 steps — Complex tasks: transport, tool_hang (proficient human)
- 1000 steps — Real-world tasks (lift_real, can_real, tool_hang_real) and TD3+BC algorithm
- 1100 steps — Multi-human transport dataset
- Trade-off: Longer horizons increase evaluation time linearly but prevent false-negative failures. Early termination on success mitigates this cost.
- Evaluation frequency: Default is 50 rollouts every 50 epochs — a good balance of compute vs. measurement stability.
Reasoning
From the dataset registry in `robomimic/__init__.py:62-64`:
ph_tasks = ["lift", "can", "square", "transport", "tool_hang"]
ph_horizons = [400, 400, 400, 700, 700]
From `robomimic/__init__.py:86-88`:
mh_tasks = ["lift", "can", "square", "transport"]
mh_horizons = [500, 500, 500, 1100]
Multi-human datasets require longer horizons because they contain demonstrations from operators of varying skill levels, resulting in more diverse and sometimes slower trajectories. The policy must accommodate this range.
TD3+BC uses an even longer default from `robomimic/config/td3_bc_config.py:34`:
self.experiment.rollout.horizon = 1000
This is because offline RL algorithms produce more cautious, exploratory policies that take longer to reach goals compared to pure imitation learning (BC) approaches.