Principle:Isaac sim IsaacGymEnvs Randomization Parameter Definition
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Specification framework for defining which physics simulation parameters to randomize, their distributions, ranges, and application schedules for sim-to-real policy transfer. This principle governs how IsaacGymEnvs encodes domain randomization (DR) configurations that ultimately determine the breadth and shape of the training distribution.
Description
Domain randomization improves sim-to-real transfer by training policies that are robust to variations in simulation parameters. The core insight is that if a policy can succeed across a sufficiently wide distribution of simulation parameters -- encompassing the true real-world values -- it will transfer successfully to real hardware without fine-tuning.
IsaacGymEnvs defines DR parameters via YAML configuration under the task.randomization_params key. The specification framework supports several orthogonal dimensions:
- Which parameters to vary: Actor properties (mass, friction, damping, stiffness, armature, joint limits, restitution), simulation parameters (gravity), and non-physical parameters (observation noise, action noise).
- Distribution type:
gaussian(parameterized by mean and standard deviation),uniform(parameterized by low and high bounds), orloguniform(uniform in log-space, useful for parameters spanning orders of magnitude like damping). - Operation type:
additive(new_value = original + sample) orscaling(new_value = original * sample). - Schedule:
linear(linearly ramp randomization strength from zero to full overschedule_steps),constant(no randomization untilschedule_steps, then full), or no schedule (full randomization from start). - Frequency: How many simulation steps between re-randomizations (the
frequencyfield). - Setup-only flag: When
setup_only: True, the property is randomized only once before simulation starts, not on resets. - Bucketing: For PhysX material properties (friction, restitution),
num_bucketsquantizes the continuous distribution into discrete buckets to stay within the 64K material limit.
For Automatic Domain Randomization (ADR), the task.adr section extends the framework with:
- init_range: Starting range for the parameter (narrower than limits).
- limits: Hard bounds that the ADR algorithm cannot exceed.
- delta: Step size for expanding or contracting the range.
- delta_style: Whether delta is applied additively or multiplicatively.
- range_path: Links an ADR parameter to the corresponding entry in
randomization_params.
Affine transformation parameters (scaling, additive, white noise) for observations and actions are also ADR-managed, encoding correlated and uncorrelated noise as ax + b + c transforms.
Usage
Configure domain randomization when setting up simulation-to-real transfer training. The typical workflow is:
- Choose which simulation parameters matter for the target real-world task (e.g., friction for manipulation, mass for locomotion).
- Define initial ranges conservatively, then widen as training stabilizes (manual DR) or let ADR find the frontier automatically.
- Select appropriate distributions:
loguniformfor parameters spanning orders of magnitude (damping, stiffness),uniformfor bounded ranges (mass scaling, friction),gaussianfor noise parameters. - Set the randomization frequency based on the physics time step and episode length.
Theoretical Basis
Domain Randomization rests on the principle that training on a distribution of environment parameters that encompasses real-world variation produces a policy robust to the sim-to-real gap. The formal objective is:
pi* = argmax_pi E_{xi ~ P(xi)} [ J(pi, xi) ]
where:
xi = vector of randomized simulation parameters
P(xi) = distribution over simulation parameters (defined by DR config)
J(pi, xi) = expected return of policy pi in environment parameterized by xi
Key trade-offs:
- Wider ranges produce more robust policies but increase training difficulty (the policy must solve a harder, more diverse set of environments).
- Narrower ranges are easier to train but risk failing on real-world parameters outside the distribution.
- ADR automatically finds the frontier: it expands ranges where the policy succeeds and contracts them where it fails, maximizing the randomization volume (measured by NPD -- nats per dimension) while maintaining policy performance.
The distribution choice matters: loguniform naturally handles parameters where the ratio matters more than the absolute difference (e.g., damping of 0.1 vs 1.0 vs 10.0), while uniform is appropriate for parameters where absolute differences matter (e.g., mass scaling of 0.5 to 1.5).