Principle:ARISE Initiative Robomimic Checkpointing and Model Saving
| Knowledge Sources | |
|---|---|
| Domains | Robotics, Training, Serialization |
| Last Updated | 2026-02-15 08:00 GMT |
Overview
A comprehensive model serialization pattern that saves trained policy weights along with all metadata needed to fully reconstruct the training environment, configuration, and normalization statistics for reproducible evaluation.
Description
Checkpointing and Model Saving captures the complete state needed to deploy a trained policy. Unlike naive model saving that only stores network weights, robomimic checkpoints include the full configuration, environment metadata, observation/action shape metadata, normalization statistics, and training state. This enables any downstream consumer (evaluation script, deployment pipeline) to reconstruct the exact conditions under which the model was trained.
The checkpoint contains:
- model: Serialized algorithm state (network weights and optimizer state)
- config: Full experiment configuration dictionary
- algo_name: Algorithm identifier string
- env_metadata: Environment construction parameters (env name, type, kwargs)
- shape_metadata: Observation key shapes, action dimension, and modality flags
- obs_normalization_stats: Running mean/std for observation normalization
- action_normalization_stats: Running mean/std for action normalization
- variable_state: Training loop state for resuming (epoch, best metrics)
Usage
Use this principle at regular intervals during training (every N epochs, on best performance, and at training completion). The saved checkpoints are consumed by the evaluation pipeline (policy_from_checkpoint) and can be used to resume interrupted training.
Theoretical Basis
The checkpointing principle implements self-contained model serialization:
# Abstract pattern (not real implementation)
checkpoint = {
"model": model.serialize(), # Network weights + optimizer state
"config": config.dump(), # Full experiment config
"algo_name": config.algo_name, # For factory reconstruction
"env_metadata": env_meta, # To recreate environment
"shape_metadata": shape_meta, # To recreate model architecture
"obs_normalization_stats": stats, # For inference normalization
}
torch.save(checkpoint, path)
The self-contained nature means any checkpoint file can be loaded and deployed without additional context files, reducing deployment friction.