Implementation:Google deepmind Dm control Composer Environment
| Attribute | Value |
|---|---|
| Implementation | Composer Environment |
| Workflow | Composer_Environment_Building |
| Domain | Reinforcement_Learning, Composition |
| Source | dm_control |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for assembling a complete reinforcement learning environment from a Composer Task and its entity hierarchy, exposing the standard dm_env.Environment interface with support for MJCF recompilation, multi-rate observations, robust resetting, and physics error handling.
Description
The Environment class in dm_control.composer.environment is the top-level entry point for running Composer-based RL experiments. It inherits from both _CommonEnvironment (which handles physics compilation, observation updater creation, and the hooks system) and dm_env.Environment (which defines the standard RL interface).
Constructor parameters:
task-- aTaskinstance that defines the root entity, reward, and termination logic.time_limit-- maximum episode duration in seconds (default: infinity).random_state-- an integer seed ornp.random.RandomStatefor reproducibility.max_reset_attempts-- how many times to retryresetifEpisodeInitializationErroris raised (default: 1, i.e., no retry).recompile_mjcf_every_episode-- ifTrue(default), callsinitialize_episode_mjcfand recompiles the physics at the start of every episode. Set toFalsefor a speedup when the model does not change between episodes.raise_exception_on_physics_error-- ifFalse, physics divergence terminates the episode with a warning instead of raising.strip_singleton_obs_buffer_dim-- ifTrue, observations withbuffer_size=1have the leading buffer dimension squeezed.fixed_initial_state-- ifTrue, every episode starts from the same random state, making trajectories deterministic given the same actions.delayed_observation_padding--ObservationPadding.ZEROorObservationPadding.INITIAL_VALUE, controlling how delayed observation buffers are initialized.legacy_step-- ifTrue(default), steps the physics state with up-to-date position and velocity dependent fields.
Key methods:
reset()-- initializes a new episode. Callsinitialize_episode_mjcf, recompiles physics (if configured), callsinitialize_episode, resets the observation updater, and returns the firstTimeStep.step(action)-- advances the environment by one control step. Calls the before/after hooks, steps the physicsn_sub_stepstimes, updates observations, computes reward and discount, checks termination, and returns aTimeStep.observation_spec()-- returns the observation specification from the updater.action_spec()-- delegates totask.action_spec(physics).reward_spec()/discount_spec()-- return custom specs from the task or the dm_env defaults.close()-- frees the underlying physics resources.
Hooks system:
The internal _EnvironmentHooks object scans all entities in the task's entity tree and memoizes non-trivial callback methods. During stepping, only non-empty callbacks are invoked, avoiding the overhead of calling no-op methods on many entities. Extra hooks can be added via add_extra_hook(hook_name, hook_callable).
Usage
Instantiate composer.Environment with a configured Task and interact with it using the standard dm_env loop. Adjust constructor parameters to control recompilation frequency, robustness, and observation buffering.
Code Reference
| Attribute | Value |
|---|---|
| Source Location | dm_control/composer/environment.py:L294-517
|
| Signature | Environment.__init__(self, task, time_limit=float('inf'), random_state=None, n_sub_steps=None, raise_exception_on_physics_error=True, strip_singleton_obs_buffer_dim=False, max_reset_attempts=1, recompile_mjcf_every_episode=True, fixed_initial_state=False, delayed_observation_padding=ObservationPadding.ZERO, legacy_step=True)
|
| Signature (reset) | Environment.reset(self) -> dm_env.TimeStep
|
| Signature (step) | Environment.step(self, action) -> dm_env.TimeStep
|
| Signature (observation_spec) | Environment.observation_spec(self) -> OrderedDict
|
| Signature (action_spec) | Environment.action_spec(self) -> specs.BoundedArray
|
| Import | from dm_control import composer or from dm_control.composer import environment
|
I/O Contract
Inputs
| Name | Type | Description |
|---|---|---|
task |
Task |
A fully configured Composer task with root entity, reward, and termination logic |
time_limit |
float | Maximum episode duration in seconds |
random_state |
int or np.random.RandomState |
Seed or RNG for reproducibility |
max_reset_attempts |
int | Number of reset retries on EpisodeInitializationError
|
recompile_mjcf_every_episode |
bool | Whether to recompile physics each episode |
action |
np.ndarray |
(for step) Agent action matching action_spec
|
Outputs
| Name | Type | Description |
|---|---|---|
reset() return |
dm_env.TimeStep |
TimeStep(FIRST, None, None, observation)
|
step() return |
dm_env.TimeStep |
TimeStep(MID or LAST, reward, discount, observation)
|
observation_spec() return |
OrderedDict[str, specs.Array] |
Maps observation names to array specs |
action_spec() return |
specs.BoundedArray |
Shape, dtype, and bounds of the action space |
reward_spec() return |
specs.Array |
Specification of the reward signal |
discount_spec() return |
specs.BoundedArray |
Specification of the discount factor |
physics |
weakref.ProxyType[mjcf.Physics] |
Weak reference to the current physics instance |
task |
Task |
The task driving this environment |
Usage Examples
Basic environment creation and interaction
from dm_control import composer
import numpy as np
# Assume ReachTask is defined as in the Task implementation page
task = ReachTask(robot=my_robot, target_entity=my_target)
env = composer.Environment(
task=task,
time_limit=10.0,
random_state=42)
# Standard dm_env interaction loop
timestep = env.reset()
while not timestep.last():
action = np.random.uniform(
low=env.action_spec().minimum,
high=env.action_spec().maximum)
timestep = env.step(action)
print(f"Reward: {timestep.reward}")
env.close()
Environment with domain randomization and robust resetting
env = composer.Environment(
task=randomized_task,
time_limit=20.0,
random_state=123,
max_reset_attempts=5,
recompile_mjcf_every_episode=True,
raise_exception_on_physics_error=False)
# The environment will retry up to 5 times if initialization fails,
# and will gracefully handle physics divergence by terminating the episode.
timestep = env.reset()
Faster environment without per-episode recompilation
# When the MJCF model does not change between episodes,
# skip recompilation for a significant speedup.
env = composer.Environment(
task=static_task,
time_limit=30.0,
recompile_mjcf_every_episode=False,
strip_singleton_obs_buffer_dim=True)
# Observations with buffer_size=1 will not have a leading dimension.
timestep = env.reset()
obs = timestep.observation
Deterministic episodes for debugging
env = composer.Environment(
task=my_task,
time_limit=5.0,
random_state=0,
fixed_initial_state=True)
# Every call to reset() produces the identical initial state.
# Given the same action sequence, the trajectory is identical.
ts1 = env.reset()
ts2 = env.reset()
# ts1.observation == ts2.observation (element-wise)
Inspecting specs
env = composer.Environment(task=my_task)
env.reset()
print("Action spec:", env.action_spec())
print("Observation spec:")
for name, spec in env.observation_spec().items():
print(f" {name}: shape={spec.shape}, dtype={spec.dtype}")
print("Reward spec:", env.reward_spec())
print("Discount spec:", env.discount_spec())
Related Pages
- Principle:Google_deepmind_Dm_control_Composer_Environment_Assembly
- Implementation:Google_deepmind_Dm_control_Composer_Entity
- Implementation:Google_deepmind_Dm_control_Composer_Arena
- Implementation:Google_deepmind_Dm_control_Composer_Task
- Implementation:Google_deepmind_Dm_control_Composer_Observables
- Implementation:Google_deepmind_Dm_control_Composer_Variation
- Environment:Google_deepmind_Dm_control_Python_MuJoCo_Runtime
- Heuristic:Google_deepmind_Dm_control_Physics_Timestep_Configuration
- Heuristic:Google_deepmind_Dm_control_Prop_Settling_Physics_Tuning