Workflow:Isaac sim IsaacGymEnvs Domain Randomization Training

Knowledge Sources	IsaacGymEnvs DeXtreme ICRA 2023 OpenAI ADR 2019
Domains	Sim_to_Real, Domain_Randomization, Reinforcement_Learning
Last Updated	2026-02-15 09:00 GMT

Overview

End-to-end process for training sim-to-real robust RL policies using Manual Domain Randomization or Automatic Domain Randomization (ADR) with Isaac Gym's on-the-fly randomization framework.

Description

This workflow covers training RL policies that are robust to the sim-to-real gap by randomizing simulation dynamics during training. IsaacGymEnvs supports two approaches: Manual DR (user-specified parameter ranges) and Automatic Domain Randomization (ADR, which automatically adjusts ranges based on policy performance). The randomization framework operates on-the-fly without reloading assets, randomizing observations, actions, sim_params (gravity), and actor_params (mass, friction, DOF properties, tendon properties). The DeXtreme codebase demonstrates both approaches for Allegro Hand cube reorientation with asymmetric actor-critic training.

Key capabilities:

On-the-fly randomization without asset reloading
Four randomization groups: observations, actions, sim_params, actor_params
Multiple distributions: uniform, loguniform, gaussian
Scheduling: constant onset and linear interpolation
ADR with vectorized boundary evaluation and automatic range expansion

Usage

Execute this workflow when you need to train policies intended for deployment on real robots, where the simulation dynamics differ from reality. This is essential for any sim-to-real transfer scenario, particularly for dexterous manipulation where contact properties are difficult to model exactly. ADR is recommended for tasks where manual tuning of randomization ranges is impractical.

Execution Steps

Step 1: Define Randomization Parameters

Configure which simulation parameters to randomize in the task YAML file. Define the distribution type, range, operation (additive or scaling), and optional schedule for each parameter. For ADR, additionally configure initial ranges, hard limits, delta step sizes, and performance thresholds.

Key considerations:

Enable randomization with task.randomize=True in the YAML
Mass and scale randomizations must use setup_only=True (GPU pipeline limitation)
Actor names in YAML must match the names used in gym.create_actor()
For ADR, set use_adr=True and configure worker_adr_boundary_fraction (typically 0.4)
ADR parameters need init_range, limits, and delta fields

Step 2: Initialize Randomization State

During environment creation, call apply_randomizations() once in create_sim() to perform the initial randomization pass. This is required for setup_only parameters like mass and scale. For ADR, the ADRVecTask base class manages the separation of environments into ADR_ROLLOUT and ADR_BOUNDARY worker groups.

What happens:

The randomization dictionary is parsed from the YAML config
Initial randomization values are sampled and applied to all environments
For ADR, environments are partitioned into rollout workers and boundary evaluation workers
Setup-only parameters (mass, scale) are randomized and locked for the simulation lifetime

Step 3: Training with Runtime Randomization

During the training loop, randomizations are applied when environments reset, governed by the configured frequency. The randomize_buf tensor tracks how many steps have elapsed since each environment was last randomized. Observation and action noise is applied every frame, while physics parameter randomization occurs at reset time.

What happens:

At each environment reset, apply_randomizations() is called with the randomization dictionary
Only environments that exceeded the frequency threshold since last randomization are re-randomized
Observation noise is added to the obs buffer before it reaches the policy
Action noise is added to the action buffer before it is applied to the simulation
The randomize_buf counter is incremented in post_physics_step()

Step 4: ADR Range Adaptation (ADR only)

For ADR training, boundary evaluation workers periodically test policy performance at the edges of parameter ranges. Based on performance thresholds (t_l and t_h), ranges are expanded or contracted by the configured delta. A queue accumulates performance statistics to avoid range changes based on transient fluctuations.

What happens:

A random parameter is selected and set to its boundary value (lower or upper limit)
Boundary workers evaluate policy performance with that extreme parameter setting
Performance is logged in a queue of length adr_queue_threshold_length
If performance exceeds t_h, the range expands by delta in the appropriate direction
If performance drops below t_l, the range contracts by delta
ADR provides a natural curriculum as ranges gradually widen during training

Step 5: Asymmetric Actor-Critic Training (DeXtreme)

For DeXtreme environments, asymmetric actor-critic is used where the policy only receives real-world-available observations while the value function receives privileged simulator information. Dictionary observations enable clean separation of policy and value function inputs in the training config.

Key considerations:

Policy inputs: joint positions, object pose (camera-randomized), goal, actions (all noised)
Value function inputs: ground-truth DOF state, forces, object velocities, unrandomized poses
At inference, only the policy network is used; the value function is discarded
Dictionary observations are enabled via use_dict_obs=True

Step 6: Checkpoint with ADR State

When saving checkpoints, the ADR parameter ranges are serialized alongside the policy weights. This allows resuming training with the evolved ranges or loading the ranges for evaluation. The checkpoint captures the full training state including optimizer and ADR evolution history.

Key considerations:

ADR ranges are saved in the checkpoint under env_state
Loading ADR ranges from checkpoint requires task.task.adr.adr_load_from_checkpoint=True
For evaluation, load the checkpoint with the ADR ranges to replicate training conditions
Evaluation creates an eval_summaries directory with TensorBoard logs

Execution Diagram

GitHub URL

Workflow Repository