Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Isaac sim IsaacGymEnvs Domain Randomization Training

From Leeroopedia
Knowledge Sources
Domains Sim_to_Real, Domain_Randomization, Reinforcement_Learning
Last Updated 2026-02-15 09:00 GMT

Overview

End-to-end process for training sim-to-real robust RL policies using Manual Domain Randomization or Automatic Domain Randomization (ADR) with Isaac Gym's on-the-fly randomization framework.

Description

This workflow covers training RL policies that are robust to the sim-to-real gap by randomizing simulation dynamics during training. IsaacGymEnvs supports two approaches: Manual DR (user-specified parameter ranges) and Automatic Domain Randomization (ADR, which automatically adjusts ranges based on policy performance). The randomization framework operates on-the-fly without reloading assets, randomizing observations, actions, sim_params (gravity), and actor_params (mass, friction, DOF properties, tendon properties). The DeXtreme codebase demonstrates both approaches for Allegro Hand cube reorientation with asymmetric actor-critic training.

Key capabilities:

  • On-the-fly randomization without asset reloading
  • Four randomization groups: observations, actions, sim_params, actor_params
  • Multiple distributions: uniform, loguniform, gaussian
  • Scheduling: constant onset and linear interpolation
  • ADR with vectorized boundary evaluation and automatic range expansion

Usage

Execute this workflow when you need to train policies intended for deployment on real robots, where the simulation dynamics differ from reality. This is essential for any sim-to-real transfer scenario, particularly for dexterous manipulation where contact properties are difficult to model exactly. ADR is recommended for tasks where manual tuning of randomization ranges is impractical.

Execution Steps

Step 1: Define Randomization Parameters

Configure which simulation parameters to randomize in the task YAML file. Define the distribution type, range, operation (additive or scaling), and optional schedule for each parameter. For ADR, additionally configure initial ranges, hard limits, delta step sizes, and performance thresholds.

Key considerations:

  • Enable randomization with task.randomize=True in the YAML
  • Mass and scale randomizations must use setup_only=True (GPU pipeline limitation)
  • Actor names in YAML must match the names used in gym.create_actor()
  • For ADR, set use_adr=True and configure worker_adr_boundary_fraction (typically 0.4)
  • ADR parameters need init_range, limits, and delta fields

Step 2: Initialize Randomization State

During environment creation, call apply_randomizations() once in create_sim() to perform the initial randomization pass. This is required for setup_only parameters like mass and scale. For ADR, the ADRVecTask base class manages the separation of environments into ADR_ROLLOUT and ADR_BOUNDARY worker groups.

What happens:

  • The randomization dictionary is parsed from the YAML config
  • Initial randomization values are sampled and applied to all environments
  • For ADR, environments are partitioned into rollout workers and boundary evaluation workers
  • Setup-only parameters (mass, scale) are randomized and locked for the simulation lifetime

Step 3: Training with Runtime Randomization

During the training loop, randomizations are applied when environments reset, governed by the configured frequency. The randomize_buf tensor tracks how many steps have elapsed since each environment was last randomized. Observation and action noise is applied every frame, while physics parameter randomization occurs at reset time.

What happens:

  • At each environment reset, apply_randomizations() is called with the randomization dictionary
  • Only environments that exceeded the frequency threshold since last randomization are re-randomized
  • Observation noise is added to the obs buffer before it reaches the policy
  • Action noise is added to the action buffer before it is applied to the simulation
  • The randomize_buf counter is incremented in post_physics_step()

Step 4: ADR Range Adaptation (ADR only)

For ADR training, boundary evaluation workers periodically test policy performance at the edges of parameter ranges. Based on performance thresholds (t_l and t_h), ranges are expanded or contracted by the configured delta. A queue accumulates performance statistics to avoid range changes based on transient fluctuations.

What happens:

  • A random parameter is selected and set to its boundary value (lower or upper limit)
  • Boundary workers evaluate policy performance with that extreme parameter setting
  • Performance is logged in a queue of length adr_queue_threshold_length
  • If performance exceeds t_h, the range expands by delta in the appropriate direction
  • If performance drops below t_l, the range contracts by delta
  • ADR provides a natural curriculum as ranges gradually widen during training

Step 5: Asymmetric Actor-Critic Training (DeXtreme)

For DeXtreme environments, asymmetric actor-critic is used where the policy only receives real-world-available observations while the value function receives privileged simulator information. Dictionary observations enable clean separation of policy and value function inputs in the training config.

Key considerations:

  • Policy inputs: joint positions, object pose (camera-randomized), goal, actions (all noised)
  • Value function inputs: ground-truth DOF state, forces, object velocities, unrandomized poses
  • At inference, only the policy network is used; the value function is discarded
  • Dictionary observations are enabled via use_dict_obs=True

Step 6: Checkpoint with ADR State

When saving checkpoints, the ADR parameter ranges are serialized alongside the policy weights. This allows resuming training with the evolved ranges or loading the ranges for evaluation. The checkpoint captures the full training state including optimizer and ADR evolution history.

Key considerations:

  • ADR ranges are saved in the checkpoint under env_state
  • Loading ADR ranges from checkpoint requires task.task.adr.adr_load_from_checkpoint=True
  • For evaluation, load the checkpoint with the ADR ranges to replicate training conditions
  • Evaluation creates an eval_summaries directory with TensorBoard logs

Execution Diagram

GitHub URL

Workflow Repository