Implementation:Google deepmind Dm control Suite Wrappers

Metadata	Value
Implementation	Suite Wrappers
Domain	Reinforcement_Learning, Software_Engineering
Source	dm_control
Workflow	Control_Suite_RL_Training
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete tool for augmenting dm_control environments using four provided wrapper classes: ActionNoise, ActionScale, Pixels, and MujocoProfiler.

Description

The dm_control.suite.wrappers package ships four environment wrappers, each implementing dm_env.Environment:

ActionNoise (action_noise.Wrapper) -- Adds zero-mean Gaussian noise to every action before forwarding it to the inner environment. The standard deviation is expressed as a fraction (scale) of each action dimension's range. The noisy action is clipped to the action spec bounds. Requires all action bounds to be finite.

ActionScale (action_scale.Wrapper) -- Linearly remaps actions from a user-specified [minimum, maximum] range to the inner environment's native action range. This is useful for normalising agent outputs (e.g. [-1, 1]) to arbitrary actuator bounds. Returns a modified action_spec reflecting the new bounds.

Pixels (pixels.Wrapper) -- Adds a rendered pixel observation to each time-step. When pixels_only=True (default), the original state observations are discarded. When pixels_only=False, both state and pixel observations are returned. Custom render settings (resolution, camera ID) can be passed via render_kwargs.

MujocoProfiler (mujoco_profiling.Wrapper) -- Appends a profiling observation containing the cumulative step timer duration and call count from MuJoCo's internal profiler. Calls physics.enable_profiling() on construction.

All four wrappers delegate unknown attribute access to the inner environment via __getattr__, preserving transparent access to physics, task, and other environment properties.

Usage

Use these wrappers when:

You want to test policy robustness to action noise (ActionNoise).
Your agent outputs actions in a normalised range and you need to map them to actuator bounds (ActionScale).
You want to train from pixel observations or generate video frames (Pixels).
You need to benchmark per-step MuJoCo performance (MujocoProfiler).

Code Reference

Wrapper	Source Location	Signature
ActionNoise	`dm_control/suite/wrappers/action_noise.py:L25-70`	`action_noise.Wrapper(env, scale=0.01)`
ActionScale	`dm_control/suite/wrappers/action_scale.py:L27-103`	`action_scale.Wrapper(env, minimum, maximum)`
Pixels	`dm_control/suite/wrappers/pixels.py:L26-115`	`pixels.Wrapper(env, pixels_only=True, render_kwargs=None, observation_key='pixels')`
MujocoProfiler	`dm_control/suite/wrappers/mujoco_profiling.py:L25-107`	`mujoco_profiling.Wrapper(env, observation_key='step_timing')`

Import:

from dm_control.suite.wrappers import action_noise
from dm_control.suite.wrappers import action_scale
from dm_control.suite.wrappers import pixels
from dm_control.suite.wrappers import mujoco_profiling

I/O Contract

ActionNoise

Input/Output	Name	Type	Description
Input	`env`	`dm_env.Environment`	Base environment. Must have finite action bounds.
Input	`scale`	float	Noise standard deviation as fraction of action range. Default `0.01`.
Output (step)	TimeStep	`dm_env.TimeStep`	Same as inner env, but action was perturbed before forwarding.

ActionScale

Input/Output	Name	Type	Description
Input	`env`	`dm_env.Environment`	Base environment. `action_spec()` must return a single `BoundedArray` with finite bounds.
Input	`minimum`	scalar or array	New lower bound for actions. Must be finite and broadcastable.
Input	`maximum`	scalar or array	New upper bound for actions. Must be finite and broadcastable.
Output (action_spec)	BoundedArraySpec	`dm_env.specs.BoundedArraySpec`	Spec with updated `minimum` and `maximum`.

Pixels

Input/Output	Name	Type	Description
Input	`env`	`dm_env.Environment`	Base environment with a `physics.render()` method.
Input	`pixels_only`	bool	If `True`, discard state observations. Default `True`.
Input	`render_kwargs`	dict or None	Keyword arguments for `physics.render()` (e.g. `{"width": 84, "height": 84, "camera_id": 0}`).
Input	`observation_key`	str	Key name for pixel observation. Default `"pixels"`.
Output (observation)	OrderedDict	`OrderedDict[str, ndarray]`	Contains `"pixels"` key (and optionally original observations).

MujocoProfiler

Input/Output	Name	Type	Description
Input	`env`	`dm_env.Environment`	Base environment with MuJoCo physics.
Input	`observation_key`	str	Key name for profiling data. Default `"step_timing"`.
Output (observation)	OrderedDict	`OrderedDict[str, ndarray]`	Original observations plus a `"step_timing"` array of shape `(2,)` containing `[duration, count]`.

Usage Examples

Add action noise for robustness testing:

from dm_control import suite
from dm_control.suite.wrappers import action_noise

env = suite.load('cheetah', 'run')
env = action_noise.Wrapper(env, scale=0.05)

time_step = env.reset()
action = env.action_spec().minimum  # actions will be perturbed by noise
time_step = env.step(action)

Rescale actions to [-1, 1] for a normalised policy output:

from dm_control import suite
from dm_control.suite.wrappers import action_scale

env = suite.load('walker', 'walk')
env = action_scale.Wrapper(env, minimum=-1.0, maximum=1.0)

print(env.action_spec().minimum)  # [-1. -1. -1. -1. -1. -1.]
print(env.action_spec().maximum)  # [ 1.  1.  1.  1.  1.  1.]

Add pixel observations for vision-based RL:

from dm_control import suite
from dm_control.suite.wrappers import pixels

env = suite.load('cartpole', 'balance')
env = pixels.Wrapper(env, pixels_only=False,
                     render_kwargs={'width': 84, 'height': 84, 'camera_id': 0})

time_step = env.reset()
print(time_step.observation['pixels'].shape)  # (84, 84, 3)
# Original state observations are also available
print(list(time_step.observation.keys()))     # ['position', 'velocity', 'pixels']

Stack multiple wrappers:

from dm_control import suite
from dm_control.suite.wrappers import action_noise, action_scale, pixels

env = suite.load('finger', 'spin')
env = action_scale.Wrapper(env, minimum=-1.0, maximum=1.0)
env = action_noise.Wrapper(env, scale=0.02)
env = pixels.Wrapper(env, pixels_only=True,
                     render_kwargs={'width': 64, 'height': 64})

# Agent sees pixel observations, outputs in [-1, 1], and noise is applied
time_step = env.reset()
print(time_step.observation['pixels'].shape)  # (64, 64, 3)

Profile MuJoCo step times:

from dm_control import suite
from dm_control.suite.wrappers import mujoco_profiling
import numpy as np

env = suite.load('humanoid', 'walk')
env = mujoco_profiling.Wrapper(env)

time_step = env.reset()
action = np.zeros(env.action_spec().shape)
time_step = env.step(action)

timing = time_step.observation['step_timing']
print(f"Cumulative step duration: {timing[0]:.6f} s")
print(f"Step timer call count:    {int(timing[1])}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment