Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Google deepmind Dm control Domain Randomization

From Leeroopedia
Attribute Value
Principle Domain Randomization
Workflow Composer_Environment_Building
Domain Reinforcement_Learning, Sim_to_Real
Source dm_control
Last Updated 2026-02-15 00:00 GMT

Overview

Domain randomization is a technique for systematically varying simulation parameters between episodes so that policies trained in simulation generalize to the real world or to unseen conditions.

Description

A policy trained in a single fixed simulation will overfit to the exact dynamics, geometry, and visual appearance of that simulation. When deployed on a physical robot or tested under different conditions, performance degrades sharply. Domain randomization addresses this by treating simulation parameters as random variables, sampling new values at the start of each episode (or even within an episode). The resulting policy must perform well across the entire distribution of environments, making it robust to the gap between simulation and reality.

Parameters commonly randomized include:

  • Geometry: Object sizes, shapes, positions, and orientations.
  • Dynamics: Friction, damping, mass, joint stiffness, actuator gains.
  • Visuals: Colors, textures, lighting direction and intensity.
  • Sensor properties: Observation noise, delay, update rate.

The randomization system must:

  1. Remember initial values: Before any variation is applied, the original value of each attribute is recorded so that variations can be expressed relative to the original (e.g., "mass plus or minus 10%").
  2. Support composable distributions: Variations can be simple constants, statistical distributions (uniform, normal, log-normal, etc.), or algebraic combinations thereof (e.g., Uniform(0.8, 1.2) * initial_mass).
  3. Operate at two levels:
    • MJCF-level variations modify the XML model before physics compilation. These can change geometry, add or remove elements, or alter compiler options. They require recompilation.
    • Physics-level variations modify compiled model parameters (in MjModel or MjData) after compilation, avoiding the cost of recompilation for parameters like friction or damping that exist in the compiled model.
  4. Evaluate nested structures: A single call should be able to evaluate a complex nested structure of constants and callables, recursively replacing each callable with its sampled value.

Usage

Use Domain Randomization when you need to:

  • Improve sim-to-real transfer: Randomize dynamics parameters (friction, mass, damping) so the trained policy is robust to the unknown real-world values.
  • Increase generalization: Randomize object positions, sizes, or colors to train vision-based policies that work across varied scenes.
  • Curriculum learning: Gradually widen the distribution of randomized parameters over the course of training.
  • Data augmentation: Use randomized visual properties (lighting, texture) to improve the diversity of rendered training data.

Theoretical Basis

Domain randomization can be formalized as training over a distribution of MDPs. Let xi represent the vector of randomized parameters. The agent optimizes:

maximize  E_{xi ~ P(xi)} [ E_{tau ~ pi(.|xi)} [ sum_t gamma^t R_t(xi) ] ]

where P(xi) is the distribution over environment parameters, and pi is the policy. The key insight is that if P(xi) is broad enough to cover the real-world parameter values, then a policy that performs well in expectation over P(xi) will also perform well in the real world.

The variation system implements this through a Variation abstraction:

class Variation:
    __call__(initial_value, current_value, random_state) -> new_value

Variations support arithmetic composition through operator overloading (+, -, *, /, **), enabling expressions like:

mass_variation = initial_mass * Uniform(0.8, 1.2)
color_variation = Uniform(low=[0, 0, 0, 1], high=[1, 1, 1, 1])

The evaluate function recursively walks an arbitrary nested structure (dicts, lists, tuples, namedtuples) and replaces each callable (Variation instance) with its sampled value, leaving constants unchanged.

The two variator classes manage the binding between attributes and their variations:

MJCFVariator:
    bind_attributes(element, attr1=variation1, attr2=variation2, ...)
    apply_variations(random_state)
        -> for each bound attribute:
             new_value = evaluate(variation, initial_value, current_value, random_state)
             element.set_attributes(attr=new_value)

PhysicsVariator:
    bind_attributes(element, attr1=variation1, ...)
    apply_variations(physics, random_state)
        -> for each bound attribute:
             binding = physics.bind(element)
             binding.attr = evaluate(variation, initial_value, current_value, random_state)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment