Implementation:Google deepmind Dm control Composer Environment For Locomotion
| Metadata | |
|---|---|
| Knowledge Sources | dm_control |
| Domains | Reinforcement Learning, Robotics, Environment Design |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for assembling dm_control locomotion components (walker, arena, task) into a complete reinforcement learning environment that implements the dm_env.Environment interface with episode lifecycle management, observation buffering, and error handling.
Description
The Template:Code class takes a fully configured task object (which already contains references to a walker and an arena) and wraps it in a dm_env-compatible environment. It manages the full episode lifecycle: MJCF model regeneration and recompilation, physics initialization, observation collection and buffering, action application through physics substeps, reward and discount computation, and episode termination. It handles procedural environments where the MJCF model changes between episodes (maze regeneration, corridor resizing) by recompiling the physics model on each reset.
Key features include:
- Episode reset with retry: If episode initialization fails (e.g., rejection sampling cannot find valid prop positions), the environment retries up to Template:Code times.
- Configurable recompilation: The Template:Code flag controls whether the MJCF model is regenerated each episode. Disabling this provides a speedup for static environments.
- Fixed initial state: The Template:Code flag ensures deterministic episode starts by resetting the random state before each initialization.
- Physics error handling: Physics divergence can either raise an exception or silently terminate the episode with zero reward, depending on Template:Code.
- Observation management: Observations are collected from all enabled walker and task observables, with support for delayed observations and configurable buffer padding.
Usage
Use Template:Code as the final assembly step after creating a walker, arena, and task. This is the object that RL training loops interact with through Template:Code and Template:Code.
Code Reference
Source Location
| Class | File | Lines |
|---|---|---|
| Environment | Template:Code | L294-517 |
Signature
class Environment(_CommonEnvironment, dm_env.Environment):
def __init__(
self,
task,
time_limit=float('inf'),
random_state=None,
n_sub_steps=None,
raise_exception_on_physics_error=True,
strip_singleton_obs_buffer_dim=False,
max_reset_attempts=1,
recompile_mjcf_every_episode=True,
fixed_initial_state=False,
delayed_observation_padding=ObservationPadding.ZERO,
legacy_step=True,
):
...
Import
from dm_control import composer
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| task | composer.Task | A fully configured task instance containing references to walker and arena. |
| time_limit | float | Maximum episode duration in seconds. Default Template:Code. |
| random_state | int or np.random.RandomState or None | Seed or random state for reproducibility. Default None. |
| max_reset_attempts | int | Maximum number of times to retry episode initialization on failure. Default 1. |
| recompile_mjcf_every_episode | bool | Whether to regenerate and recompile the MJCF model each episode. Default True. |
| fixed_initial_state | bool | If True, reset random state before each episode for determinism. Default False. |
| raise_exception_on_physics_error | bool | If True, raise PhysicsError; if False, terminate episode silently. Default True. |
| strip_singleton_obs_buffer_dim | bool | If True, remove leading dimension from observations with buffer_size=1. Default False. |
Outputs
| Method | Return Type | Description |
|---|---|---|
| reset() | dm_env.TimeStep | First timestep of a new episode (step_type=FIRST, reward=None, discount=None). |
| step(action) | dm_env.TimeStep | Timestep after applying action (step_type=MID or LAST, reward, discount, observation). |
| observation_spec() | OrderedDict | Maps observation names to specs.Array with shape and dtype. |
| action_spec() | specs.BoundedArray | Action bounds and shape from the task. |
| reward_spec() | specs.Array | Reward specification. |
| discount_spec() | specs.Array | Discount specification. |
Usage Examples
Basic locomotion environment with default settings:
from dm_control import composer
from dm_control.locomotion.walkers import cmu_humanoid
from dm_control.locomotion.arenas import floors
from dm_control.locomotion.tasks import go_to_target
walker = cmu_humanoid.CMUHumanoidPositionControlled()
arena = floors.Floor(size=(8, 8))
task = go_to_target.GoToTarget(
walker=walker, arena=arena,
physics_timestep=0.005, control_timestep=0.03)
env = composer.Environment(
task=task,
time_limit=30,
strip_singleton_obs_buffer_dim=True)
# Standard RL interaction loop
timestep = env.reset()
while not timestep.last():
action = env.action_spec().generate_value() # random action
timestep = env.step(action)
print(f"Reward: {timestep.reward}")
Environment with procedural maze and retry logic:
from dm_control import composer
# task is a ManyGoalsMaze with RandomMazeWithTargets arena
env = composer.Environment(
task=task,
time_limit=30,
random_state=42,
max_reset_attempts=5,
recompile_mjcf_every_episode=True,
strip_singleton_obs_buffer_dim=True)
# Each reset generates a new maze layout
timestep = env.reset()
print(env.observation_spec().keys())
print(env.action_spec().shape)
Deterministic environment for debugging:
from dm_control import composer
env = composer.Environment(
task=task,
time_limit=10,
random_state=0,
fixed_initial_state=True,
recompile_mjcf_every_episode=True)
# Every reset produces the same initial state
ts1 = env.reset()
ts2 = env.reset()
# ts1.observation and ts2.observation will be identical
Environment with lenient physics error handling:
from dm_control import composer
env = composer.Environment(
task=task,
time_limit=60,
raise_exception_on_physics_error=False,
max_reset_attempts=3)
# Physics divergence will terminate the episode with reward=0
# rather than raising an exception
timestep = env.reset()
while not timestep.last():
timestep = env.step(action)