Principle:Isaac sim IsaacGymEnvs Automatic Domain Randomization

**Metadata**
Knowledge Sources	OpenAI ADR DeXtreme IsaacGymEnvs
Domains	Sim_to_Real Reinforcement_Learning
Last Updated	2026-02-15 00:00 GMT

Overview

Algorithm that automatically expands or contracts domain randomization parameter ranges based on the policy's ability to maintain performance at distribution boundaries. ADR eliminates the need for manual tuning of DR ranges by letting the training process itself discover the maximum viable randomization volume.

Description

Automatic Domain Randomization (ADR) extends manual DR by automatically adjusting parameter ranges during training. The core mechanism works as follows:

Worker allocation: A fraction of environments (controlled by worker_adr_boundary_fraction, typically 40%) are designated as boundary workers. These environments are assigned to evaluate policy performance at the extreme ends of a specific parameter's range. The remaining environments are rollout workers that train with parameters sampled from the current ADR ranges.

Boundary evaluation: Each boundary worker is assigned a specific parameter and direction (lower or upper bound). For that environment, the evaluated parameter is set to its current boundary value while all other parameters are sampled normally. When the episode ends, the worker's performance metric (consecutive successes) is recorded in a per-direction, per-parameter queue.

Range adjustment: When a boundary evaluation queue reaches the threshold length (adr_queue_threshold_length, typically 256):

If mean performance exceeds adr_objective_threshold_high (e.g., 20 successes): the range is expanded by delta in the boundary direction (the policy handles this difficulty well).
If mean performance falls below adr_objective_threshold_low (e.g., 5 successes): the range is contracted by delta toward the initial range (the policy struggles at this boundary).
If mean performance is between the thresholds: no change (insufficient signal).

When a boundary changes, the corresponding queue is cleared and boundary workers are reassigned, since their data was collected under the old range.

NPD metric: The total randomization volume is measured as nats per dimension (NPD) -- the average log-range across all ADR parameters. Higher NPD indicates broader randomization and potentially more robust policies.

Tensor-based ADR parameters: Parameters without a range_path (like affine noise coefficients, action delay, and cube observation delay) are managed as per-environment tensors via sample_adr_tensor() rather than through the physics property API.

Usage

Enable ADR by setting task.adr.use_adr: True in the task YAML configuration. Key configuration parameters:

task:
  adr:
    use_adr: True
    update_adr_ranges: True
    worker_adr_boundary_fraction: 0.4
    adr_queue_threshold_length: 256
    adr_objective_threshold_low: 5
    adr_objective_threshold_high: 20
    adr_rollout_perf_alpha: 0.99

Theoretical Basis

ADR implements Algorithm 1 from the OpenAI paper (Akkaya et al., 2019). The algorithm maintains per-parameter, per-direction evaluation queues and adjusts ranges based on policy performance:

ADR_UPDATE(params, boundary_queues, objective):
    For each ADR parameter P with range [lo, hi]:
        For direction in {lower, upper}:
            Collect boundary evaluation results into queue_direction
            If len(queue_direction) >= threshold:
                mean_perf = mean(queue_direction)
                If mean_perf > high_threshold:
                    Expand range in this direction by delta
                    Clear queue (data is now stale)
                Elif mean_perf < low_threshold:
                    Contract range in this direction by delta (toward init_range)
                    Clear queue

    NPD = (1/N) * sum(log(hi_i - lo_i) for each enabled parameter i)
    Recycle completed boundary workers with new parameter assignments

Key properties of the ADR algorithm:

Monotonic frontier expansion: Under consistent policy improvement, ADR monotonically expands the randomization range. Contraction only occurs when the policy demonstrably fails at a boundary.
Decoupled evaluation: Each parameter's boundaries are evaluated independently, allowing the algorithm to expand easy parameters quickly while holding difficult ones constant.
Stale data rejection: When a boundary changes, all accumulated data for that boundary is discarded since it was collected under a different range.
Extended boundary sampling: When adr_extended_boundary_sample: True, boundary workers are evaluated at one delta step beyond the current range, testing whether the policy can handle the next expansion before it happens.
Queue clearing options: When clear_other_queues: True, changing any boundary clears all queues and recycles all boundary workers. This is more conservative but avoids interactions between simultaneously changing parameters.

Related Pages

Implementation:Isaac_sim_IsaacGymEnvs_ADRVecTask_Adr_Update

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment