Implementation:Google deepmind Dm control Composer Environment For Locomotion

Metadata
Knowledge Sources	dm_control
Domains	Reinforcement Learning, Robotics, Environment Design
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete tool for assembling dm_control locomotion components (walker, arena, task) into a complete reinforcement learning environment that implements the dm_env.Environment interface with episode lifecycle management, observation buffering, and error handling.

Description

The Template:Code class takes a fully configured task object (which already contains references to a walker and an arena) and wraps it in a dm_env-compatible environment. It manages the full episode lifecycle: MJCF model regeneration and recompilation, physics initialization, observation collection and buffering, action application through physics substeps, reward and discount computation, and episode termination. It handles procedural environments where the MJCF model changes between episodes (maze regeneration, corridor resizing) by recompiling the physics model on each reset.

Key features include:

Episode reset with retry: If episode initialization fails (e.g., rejection sampling cannot find valid prop positions), the environment retries up to Template:Code times.
Configurable recompilation: The Template:Code flag controls whether the MJCF model is regenerated each episode. Disabling this provides a speedup for static environments.
Fixed initial state: The Template:Code flag ensures deterministic episode starts by resetting the random state before each initialization.
Physics error handling: Physics divergence can either raise an exception or silently terminate the episode with zero reward, depending on Template:Code.
Observation management: Observations are collected from all enabled walker and task observables, with support for delayed observations and configurable buffer padding.

Usage

Use Template:Code as the final assembly step after creating a walker, arena, and task. This is the object that RL training loops interact with through Template:Code and Template:Code.

Code Reference

Source Location

Class	File	Lines
Environment	Template:Code	L294-517

Signature

class Environment(_CommonEnvironment, dm_env.Environment):
    def __init__(
        self,
        task,
        time_limit=float('inf'),
        random_state=None,
        n_sub_steps=None,
        raise_exception_on_physics_error=True,
        strip_singleton_obs_buffer_dim=False,
        max_reset_attempts=1,
        recompile_mjcf_every_episode=True,
        fixed_initial_state=False,
        delayed_observation_padding=ObservationPadding.ZERO,
        legacy_step=True,
    ):
        ...

Import

from dm_control import composer

I/O Contract

Inputs

Parameter	Type	Description
task	composer.Task	A fully configured task instance containing references to walker and arena.
time_limit	float	Maximum episode duration in seconds. Default Template:Code.
random_state	int or np.random.RandomState or None	Seed or random state for reproducibility. Default None.
max_reset_attempts	int	Maximum number of times to retry episode initialization on failure. Default 1.
recompile_mjcf_every_episode	bool	Whether to regenerate and recompile the MJCF model each episode. Default True.
fixed_initial_state	bool	If True, reset random state before each episode for determinism. Default False.
raise_exception_on_physics_error	bool	If True, raise PhysicsError; if False, terminate episode silently. Default True.
strip_singleton_obs_buffer_dim	bool	If True, remove leading dimension from observations with buffer_size=1. Default False.

Outputs

Method	Return Type	Description
reset()	dm_env.TimeStep	First timestep of a new episode (step_type=FIRST, reward=None, discount=None).
step(action)	dm_env.TimeStep	Timestep after applying action (step_type=MID or LAST, reward, discount, observation).
observation_spec()	OrderedDict	Maps observation names to specs.Array with shape and dtype.
action_spec()	specs.BoundedArray	Action bounds and shape from the task.
reward_spec()	specs.Array	Reward specification.
discount_spec()	specs.Array	Discount specification.

Usage Examples

Basic locomotion environment with default settings:

from dm_control import composer
from dm_control.locomotion.walkers import cmu_humanoid
from dm_control.locomotion.arenas import floors
from dm_control.locomotion.tasks import go_to_target

walker = cmu_humanoid.CMUHumanoidPositionControlled()
arena = floors.Floor(size=(8, 8))
task = go_to_target.GoToTarget(
    walker=walker, arena=arena,
    physics_timestep=0.005, control_timestep=0.03)

env = composer.Environment(
    task=task,
    time_limit=30,
    strip_singleton_obs_buffer_dim=True)

# Standard RL interaction loop
timestep = env.reset()
while not timestep.last():
    action = env.action_spec().generate_value()  # random action
    timestep = env.step(action)
    print(f"Reward: {timestep.reward}")

Environment with procedural maze and retry logic:

from dm_control import composer

# task is a ManyGoalsMaze with RandomMazeWithTargets arena
env = composer.Environment(
    task=task,
    time_limit=30,
    random_state=42,
    max_reset_attempts=5,
    recompile_mjcf_every_episode=True,
    strip_singleton_obs_buffer_dim=True)

# Each reset generates a new maze layout
timestep = env.reset()
print(env.observation_spec().keys())
print(env.action_spec().shape)

Deterministic environment for debugging:

from dm_control import composer

env = composer.Environment(
    task=task,
    time_limit=10,
    random_state=0,
    fixed_initial_state=True,
    recompile_mjcf_every_episode=True)

# Every reset produces the same initial state
ts1 = env.reset()
ts2 = env.reset()
# ts1.observation and ts2.observation will be identical

Environment with lenient physics error handling:

from dm_control import composer

env = composer.Environment(
    task=task,
    time_limit=60,
    raise_exception_on_physics_error=False,
    max_reset_attempts=3)

# Physics divergence will terminate the episode with reward=0
# rather than raising an exception
timestep = env.reset()
while not timestep.last():
    timestep = env.step(action)

Related Pages

Principle:Google_deepmind_Dm_control_Locomotion_Environment_Assembly

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment