Implementation:Google deepmind Dm control Suite Wrappers
| Metadata | Value |
|---|---|
| Implementation | Suite Wrappers |
| Domain | Reinforcement_Learning, Software_Engineering |
| Source | dm_control |
| Workflow | Control_Suite_RL_Training |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for augmenting dm_control environments using four provided wrapper classes: ActionNoise, ActionScale, Pixels, and MujocoProfiler.
Description
The dm_control.suite.wrappers package ships four environment wrappers, each implementing dm_env.Environment:
ActionNoise (action_noise.Wrapper) -- Adds zero-mean Gaussian noise to every action before forwarding it to the inner environment. The standard deviation is expressed as a fraction (scale) of each action dimension's range. The noisy action is clipped to the action spec bounds. Requires all action bounds to be finite.
ActionScale (action_scale.Wrapper) -- Linearly remaps actions from a user-specified [minimum, maximum] range to the inner environment's native action range. This is useful for normalising agent outputs (e.g. [-1, 1]) to arbitrary actuator bounds. Returns a modified action_spec reflecting the new bounds.
Pixels (pixels.Wrapper) -- Adds a rendered pixel observation to each time-step. When pixels_only=True (default), the original state observations are discarded. When pixels_only=False, both state and pixel observations are returned. Custom render settings (resolution, camera ID) can be passed via render_kwargs.
MujocoProfiler (mujoco_profiling.Wrapper) -- Appends a profiling observation containing the cumulative step timer duration and call count from MuJoCo's internal profiler. Calls physics.enable_profiling() on construction.
All four wrappers delegate unknown attribute access to the inner environment via __getattr__, preserving transparent access to physics, task, and other environment properties.
Usage
Use these wrappers when:
- You want to test policy robustness to action noise (ActionNoise).
- Your agent outputs actions in a normalised range and you need to map them to actuator bounds (ActionScale).
- You want to train from pixel observations or generate video frames (Pixels).
- You need to benchmark per-step MuJoCo performance (MujocoProfiler).
Code Reference
| Wrapper | Source Location | Signature |
|---|---|---|
| ActionNoise | dm_control/suite/wrappers/action_noise.py:L25-70 |
action_noise.Wrapper(env, scale=0.01)
|
| ActionScale | dm_control/suite/wrappers/action_scale.py:L27-103 |
action_scale.Wrapper(env, minimum, maximum)
|
| Pixels | dm_control/suite/wrappers/pixels.py:L26-115 |
pixels.Wrapper(env, pixels_only=True, render_kwargs=None, observation_key='pixels')
|
| MujocoProfiler | dm_control/suite/wrappers/mujoco_profiling.py:L25-107 |
mujoco_profiling.Wrapper(env, observation_key='step_timing')
|
Import:
from dm_control.suite.wrappers import action_noise
from dm_control.suite.wrappers import action_scale
from dm_control.suite.wrappers import pixels
from dm_control.suite.wrappers import mujoco_profiling
I/O Contract
ActionNoise
| Input/Output | Name | Type | Description |
|---|---|---|---|
| Input | env |
dm_env.Environment |
Base environment. Must have finite action bounds. |
| Input | scale |
float | Noise standard deviation as fraction of action range. Default 0.01.
|
| Output (step) | TimeStep | dm_env.TimeStep |
Same as inner env, but action was perturbed before forwarding. |
ActionScale
| Input/Output | Name | Type | Description |
|---|---|---|---|
| Input | env |
dm_env.Environment |
Base environment. action_spec() must return a single BoundedArray with finite bounds.
|
| Input | minimum |
scalar or array | New lower bound for actions. Must be finite and broadcastable. |
| Input | maximum |
scalar or array | New upper bound for actions. Must be finite and broadcastable. |
| Output (action_spec) | BoundedArraySpec | dm_env.specs.BoundedArraySpec |
Spec with updated minimum and maximum.
|
Pixels
| Input/Output | Name | Type | Description |
|---|---|---|---|
| Input | env |
dm_env.Environment |
Base environment with a physics.render() method.
|
| Input | pixels_only |
bool | If True, discard state observations. Default True.
|
| Input | render_kwargs |
dict or None | Keyword arguments for physics.render() (e.g. {"width": 84, "height": 84, "camera_id": 0}).
|
| Input | observation_key |
str | Key name for pixel observation. Default "pixels".
|
| Output (observation) | OrderedDict | OrderedDict[str, ndarray] |
Contains "pixels" key (and optionally original observations).
|
MujocoProfiler
| Input/Output | Name | Type | Description |
|---|---|---|---|
| Input | env |
dm_env.Environment |
Base environment with MuJoCo physics. |
| Input | observation_key |
str | Key name for profiling data. Default "step_timing".
|
| Output (observation) | OrderedDict | OrderedDict[str, ndarray] |
Original observations plus a "step_timing" array of shape (2,) containing [duration, count].
|
Usage Examples
Add action noise for robustness testing:
from dm_control import suite
from dm_control.suite.wrappers import action_noise
env = suite.load('cheetah', 'run')
env = action_noise.Wrapper(env, scale=0.05)
time_step = env.reset()
action = env.action_spec().minimum # actions will be perturbed by noise
time_step = env.step(action)
Rescale actions to [-1, 1] for a normalised policy output:
from dm_control import suite
from dm_control.suite.wrappers import action_scale
env = suite.load('walker', 'walk')
env = action_scale.Wrapper(env, minimum=-1.0, maximum=1.0)
print(env.action_spec().minimum) # [-1. -1. -1. -1. -1. -1.]
print(env.action_spec().maximum) # [ 1. 1. 1. 1. 1. 1.]
Add pixel observations for vision-based RL:
from dm_control import suite
from dm_control.suite.wrappers import pixels
env = suite.load('cartpole', 'balance')
env = pixels.Wrapper(env, pixels_only=False,
render_kwargs={'width': 84, 'height': 84, 'camera_id': 0})
time_step = env.reset()
print(time_step.observation['pixels'].shape) # (84, 84, 3)
# Original state observations are also available
print(list(time_step.observation.keys())) # ['position', 'velocity', 'pixels']
Stack multiple wrappers:
from dm_control import suite
from dm_control.suite.wrappers import action_noise, action_scale, pixels
env = suite.load('finger', 'spin')
env = action_scale.Wrapper(env, minimum=-1.0, maximum=1.0)
env = action_noise.Wrapper(env, scale=0.02)
env = pixels.Wrapper(env, pixels_only=True,
render_kwargs={'width': 64, 'height': 64})
# Agent sees pixel observations, outputs in [-1, 1], and noise is applied
time_step = env.reset()
print(time_step.observation['pixels'].shape) # (64, 64, 3)
Profile MuJoCo step times:
from dm_control import suite
from dm_control.suite.wrappers import mujoco_profiling
import numpy as np
env = suite.load('humanoid', 'walk')
env = mujoco_profiling.Wrapper(env)
time_step = env.reset()
action = np.zeros(env.action_spec().shape)
time_step = env.step(action)
timing = time_step.observation['step_timing']
print(f"Cumulative step duration: {timing[0]:.6f} s")
print(f"Step timer call count: {int(timing[1])}")