Heuristic:ARISE Initiative Robomimic Checkpoint Selection Strategy
| Knowledge Sources | |
|---|---|
| Domains | Robot_Learning, Optimization |
| Last Updated | 2026-02-15 07:30 GMT |
Overview
Save model checkpoints based on best rollout success rate rather than best return or best validation loss, as success rate is the most reliable metric for robot manipulation tasks.
Description
Robomimic provides multiple checkpoint saving criteria: periodic (every N epochs), best validation loss, best rollout return, and best rollout success rate. The default configuration prioritizes success rate (`on_best_rollout_success_rate=True`) over return (`on_best_rollout_return=False`) and validation loss (`on_best_validation=False`). This reflects empirical experience that in manipulation tasks, success rate correlates better with actual task completion than cumulative reward, which can be inflated by partial completions or reward shaping artifacts.
Usage
Apply this heuristic when deciding which checkpoint to deploy from a training run. The `model_best_training_<metric>.pth` files contain the best model per metric. Default settings already prioritize success rate. Override only when using specific reward-shaping schemes that make return a more informative metric.
The Insight (Rule of Thumb)
- Action: Keep the default checkpoint saving settings:
- `on_best_rollout_success_rate = True`
- `on_best_rollout_return = False`
- `on_best_validation = False`
- `every_n_epochs = 50`
- Value: The `model_best_training_success_rate.pth` checkpoint is the recommended deployment model.
- Trade-off: Success rate is a binary metric (0 or 1 per episode), making it noisier per-episode than return. Use `rollout.n = 50` rollouts to get stable estimates.
- Early termination: Enable `rollout.terminate_on_success = True` (default) to speed up evaluation by ending episodes as soon as the task is completed.
Reasoning
From `robomimic/config/base_config.py:92-98`:
self.experiment.save.enabled = True
self.experiment.save.every_n_seconds = None
self.experiment.save.every_n_epochs = 50
self.experiment.save.epochs = []
self.experiment.save.on_best_validation = False
self.experiment.save.on_best_rollout_return = False
self.experiment.save.on_best_rollout_success_rate = True
From `robomimic/config/base_config.py:117-122`:
self.experiment.rollout.enabled = True
self.experiment.rollout.n = 50 # number of rollouts per evaluation
self.experiment.rollout.horizon = 400 # maximum number of env steps per rollout
self.experiment.rollout.rate = 50 # do rollouts every @rate epochs
self.experiment.rollout.warmstart = 0
self.experiment.rollout.terminate_on_success = True
Evaluation frequency of 50 rollouts every 50 epochs balances compute cost against the need for stable success rate estimates. The 50-rollout sample size reduces variance in success rate measurement.