Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Isaac sim IsaacGymEnvs ADR State Checkpointing

From Leeroopedia
Metadata
Knowledge Sources
Domains
Last Updated 2026-02-15 00:00 GMT

Overview

Mechanism for serializing and restoring the complete ADR training state alongside model checkpoints for seamless training resumption. This principle ensures that the full domain randomization state -- parameter ranges, boundary evaluation queues, worker assignments, and per-environment randomized values -- can be saved and restored without loss of training progress.

Description

ADR training involves complex mutable state beyond model weights that must be preserved across checkpoint/resume cycles:

  • ADR parameter ranges: The current [lo, hi] range for each ADR parameter, which may have been expanded or contracted from the initial values over potentially millions of training steps.
  • Boundary evaluation queues: Per-parameter, per-direction deques of boundary worker performance results. These queues take time to fill (typically 256 entries) and losing them resets the ADR algorithm's progress toward evaluating the current boundary.
  • Worker type assignments: Per-environment worker type (rollout, boundary, test) and ADR mode (which parameter and direction each boundary worker is evaluating).
  • ADR tensor values: Per-environment values for tensor-based ADR parameters (affine noise coefficients, delay probabilities, etc.).
  • Environment-specific state: Task-level state such as the action moving average scalar, cube random parameters, and hand random parameters.

The get_env_state() / set_env_state() interface captures this state as a serializable dictionary that is saved alongside model checkpoints. The rl_games training framework calls get_env_state() when saving and set_env_state() when loading.

Usage

State checkpointing is transparent to the user -- it happens automatically when checkpoints are saved during training. To resume ADR training from a checkpoint:

python train.py task=AllegroHandDextremeADR checkpoint=runs/AllegroHandADR/.../nn/last.pth

The adr_load_from_checkpoint flag controls whether ADR parameters are restored from the checkpoint or re-initialized:

task:
  adr:
    adr_load_from_checkpoint: true   # Restore ADR ranges from checkpoint
    # adr_load_from_checkpoint: false  # Start with fresh init_range values

Theoretical Basis

ADR state checkpointing follows the state serialization pattern: capture all mutable training state in a dictionary that can be saved and loaded via torch.save() / torch.load() alongside the model checkpoint.

CHECKPOINT_SAVE:
    model_state = policy.state_dict()
    env_state = env.get_env_state()     # ADR params, queues, worker types, tensors
    save({model_state, env_state}, path)

CHECKPOINT_LOAD:
    checkpoint = load(path)
    policy.load_state_dict(checkpoint.model_state)
    env.set_env_state(checkpoint.env_state)  # Restore ADR state

Key design considerations:

  • Completeness: All mutable state that affects future training behavior must be captured. Missing any component (e.g., queues) would cause the ADR algorithm to behave differently after resume.
  • Selectivity: The adr_load_from_checkpoint flag allows intentional re-initialization of ADR ranges (e.g., when fine-tuning a pre-trained policy with fresh ADR).
  • Hierarchical state: The state is composed across the class hierarchy -- AllegroHandDextreme adds task-specific tensors (cube/hand random params) to the state returned by ADRVecTask and VecTaskDextreme.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment