Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Google deepmind Dm control Composer Environment Building

From Leeroopedia
Revision as of 11:01, 16 February 2026 by Admin (talk | contribs) (Auto-imported from workflows/Google_deepmind_Dm_control_Composer_Environment_Building.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Reinforcement_Learning, Environment_Design, Physics_Simulation
Last Updated 2026-02-15 12:00 GMT

Overview

End-to-end process for building custom reinforcement learning environments using the dm_control Composer framework by assembling reusable Entity, Arena, and Task components.

Description

This workflow covers the standard procedure for creating rich, customizable RL environments using the Composer framework. Composer provides a higher-level abstraction over raw MuJoCo simulation, enabling modular environment construction from three core building blocks: Entities (physical objects with observables and lifecycle hooks), Arenas (the physical world containing entities), and Tasks (reward functions, termination conditions, and episode logic). The framework handles MJCF model composition, physics compilation, observation buffering, domain randomization, and the dm_env interface automatically. The output is a fully functional dm_env-compatible Environment suitable for RL training.

Usage

Execute this workflow when you need to create a custom RL environment beyond the pre-built Control Suite, combining specific walkers, arenas, props, and task logic. This is the standard approach for building environments with complex observation spaces, multi-rate observations, domain randomization, or custom reward functions.

Execution Steps

Step 1: Define or Select an Entity

Create or select an Entity — the fundamental building block representing any physical object in the environment. An Entity wraps an MJCF model and exposes observables (joint positions, velocities, camera images), lifecycle hooks (initialize_episode, before_step, after_step), and attachment points for composing with other entities. For robotic agents, use the Robot subclass which adds actuator management.

Key considerations:

  • Entity is the abstract base class; Robot extends it for actuated agents
  • Each Entity owns an MJCF model accessible via mjcf_model property
  • Observables are defined as MJCFFeature, MujocoFeature, or Generic callables
  • Entities can be attached to other entities via MJCF site attachment points
  • The @composer.define.cached_property decorator enables lazy MJCF element creation

Step 2: Define or Select an Arena

Create or select an Arena — a specialized Entity that serves as the root environment. The Arena provides the ground plane, global lighting, skybox, simulation settings, and attachment points for other entities. Built-in arenas include flat floors, corridors (empty, gaps, walls), bowl-shaped terrains, and procedurally generated mazes.

Key considerations:

  • Arena extends Entity and serves as the root of the MJCF model hierarchy
  • Built-in arenas: Floor, EmptyCorridor, GapsCorridor, WallsCorridor, Bowl, RandomMazeWithTargets
  • Arenas define the physical boundaries and visual appearance of the world
  • Custom arenas can add cameras, lights, and terrain features
  • The add_free_entity method attaches entities with a freejoint for free-floating objects

Step 3: Define the Task

Create a Task subclass that specifies the reward function, termination conditions, episode initialization, observables configuration, and timestep management. The Task connects the walker (agent) to the arena, defines what the agent should optimize, and manages the episode lifecycle through hooks.

Key considerations:

  • Task is the abstract base class with required methods: get_reward, should_terminate_episode
  • The root_entity property returns the arena (root of the MJCF tree)
  • initialize_episode_mjcf is called before physics compilation for domain randomization
  • initialize_episode is called after compilation for physics-state initialization
  • Tasks manage control_timestep and physics_timestep independently

Step 4: Configure Observables

Enable and configure the observables that will be exposed to the RL agent. Observables support multi-rate updates (different observation frequencies), buffering with configurable delays, and aggregation functions. Each Entity defines its available observables; the Task selects which ones to enable and their update rates.

Key considerations:

  • Observables are enabled/disabled per-entity via observable_options
  • Multi-rate observation supports different update frequencies for different sensors
  • Buffer sizes and delays can simulate realistic sensor latencies
  • Aggregation functions (e.g., mean, max) summarize buffered observations
  • Camera observables provide rendered pixel arrays at configurable resolutions

Step 5: Apply Domain Randomization

Optionally configure variations to randomize environment parameters at each episode reset. The variation subsystem provides statistical distributions (Uniform, Normal, LogNormal), color randomizers (RGB, HSV, Grayscale), rotation randomizers (quaternion sampling), and noise injectors (Additive, Multiplicative). Variations are applied during initialize_episode_mjcf before physics compilation.

Key considerations:

  • Variations implement the Variation base class with a __call__ protocol
  • MJCFVariator and PhysicsVariator apply variations to MJCF attributes and physics state
  • Distributions wrap numpy random state for reproducible randomization
  • VariationBroadcaster shares a single sample across multiple consumers
  • Variations compose algebraically via operator overloading (+, *, -, /)

Step 6: Assemble and Launch the Environment

Instantiate the composer.Environment with the configured Task, time limit, and random state. The Environment handles MJCF model compilation to MuJoCo physics, the reset/step loop, observation collection via the Updater, and the dm_env interface. Optionally launch the interactive viewer for visualization and debugging.

Key considerations:

  • composer.Environment wraps the Task and manages the simulation lifecycle
  • strip_singleton_obs_buffer_dim removes unnecessary buffer dimensions for single-step observations
  • The environment automatically recompiles physics when MJCF changes during reset
  • Random state can be seeded for reproducible episodes
  • The viewer.launch function accepts the environment for interactive visualization

Execution Diagram

GitHub URL

Workflow Repository