Principle:Isaac sim IsaacGymEnvs ADR State Checkpointing
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Mechanism for serializing and restoring the complete ADR training state alongside model checkpoints for seamless training resumption. This principle ensures that the full domain randomization state -- parameter ranges, boundary evaluation queues, worker assignments, and per-environment randomized values -- can be saved and restored without loss of training progress.
Description
ADR training involves complex mutable state beyond model weights that must be preserved across checkpoint/resume cycles:
- ADR parameter ranges: The current [lo, hi] range for each ADR parameter, which may have been expanded or contracted from the initial values over potentially millions of training steps.
- Boundary evaluation queues: Per-parameter, per-direction deques of boundary worker performance results. These queues take time to fill (typically 256 entries) and losing them resets the ADR algorithm's progress toward evaluating the current boundary.
- Worker type assignments: Per-environment worker type (rollout, boundary, test) and ADR mode (which parameter and direction each boundary worker is evaluating).
- ADR tensor values: Per-environment values for tensor-based ADR parameters (affine noise coefficients, delay probabilities, etc.).
- Environment-specific state: Task-level state such as the action moving average scalar, cube random parameters, and hand random parameters.
The get_env_state() / set_env_state() interface captures this state as a serializable dictionary that is saved alongside model checkpoints. The rl_games training framework calls get_env_state() when saving and set_env_state() when loading.
Usage
State checkpointing is transparent to the user -- it happens automatically when checkpoints are saved during training. To resume ADR training from a checkpoint:
python train.py task=AllegroHandDextremeADR checkpoint=runs/AllegroHandADR/.../nn/last.pth
The adr_load_from_checkpoint flag controls whether ADR parameters are restored from the checkpoint or re-initialized:
task:
adr:
adr_load_from_checkpoint: true # Restore ADR ranges from checkpoint
# adr_load_from_checkpoint: false # Start with fresh init_range values
Theoretical Basis
ADR state checkpointing follows the state serialization pattern: capture all mutable training state in a dictionary that can be saved and loaded via torch.save() / torch.load() alongside the model checkpoint.
CHECKPOINT_SAVE:
model_state = policy.state_dict()
env_state = env.get_env_state() # ADR params, queues, worker types, tensors
save({model_state, env_state}, path)
CHECKPOINT_LOAD:
checkpoint = load(path)
policy.load_state_dict(checkpoint.model_state)
env.set_env_state(checkpoint.env_state) # Restore ADR state
Key design considerations:
- Completeness: All mutable state that affects future training behavior must be captured. Missing any component (e.g., queues) would cause the ADR algorithm to behave differently after resume.
- Selectivity: The
adr_load_from_checkpointflag allows intentional re-initialization of ADR ranges (e.g., when fine-tuning a pre-trained policy with fresh ADR). - Hierarchical state: The state is composed across the class hierarchy --
AllegroHandDextremeadds task-specific tensors (cube/hand random params) to the state returned byADRVecTaskandVecTaskDextreme.