Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Google deepmind Dm control Suite Wrappers

From Leeroopedia
Metadata Value
Implementation Suite Wrappers
Domain Reinforcement_Learning, Software_Engineering
Source dm_control
Workflow Control_Suite_RL_Training
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete tool for augmenting dm_control environments using four provided wrapper classes: ActionNoise, ActionScale, Pixels, and MujocoProfiler.

Description

The dm_control.suite.wrappers package ships four environment wrappers, each implementing dm_env.Environment:

ActionNoise (action_noise.Wrapper) -- Adds zero-mean Gaussian noise to every action before forwarding it to the inner environment. The standard deviation is expressed as a fraction (scale) of each action dimension's range. The noisy action is clipped to the action spec bounds. Requires all action bounds to be finite.

ActionScale (action_scale.Wrapper) -- Linearly remaps actions from a user-specified [minimum, maximum] range to the inner environment's native action range. This is useful for normalising agent outputs (e.g. [-1, 1]) to arbitrary actuator bounds. Returns a modified action_spec reflecting the new bounds.

Pixels (pixels.Wrapper) -- Adds a rendered pixel observation to each time-step. When pixels_only=True (default), the original state observations are discarded. When pixels_only=False, both state and pixel observations are returned. Custom render settings (resolution, camera ID) can be passed via render_kwargs.

MujocoProfiler (mujoco_profiling.Wrapper) -- Appends a profiling observation containing the cumulative step timer duration and call count from MuJoCo's internal profiler. Calls physics.enable_profiling() on construction.

All four wrappers delegate unknown attribute access to the inner environment via __getattr__, preserving transparent access to physics, task, and other environment properties.

Usage

Use these wrappers when:

  • You want to test policy robustness to action noise (ActionNoise).
  • Your agent outputs actions in a normalised range and you need to map them to actuator bounds (ActionScale).
  • You want to train from pixel observations or generate video frames (Pixels).
  • You need to benchmark per-step MuJoCo performance (MujocoProfiler).

Code Reference

Wrapper Source Location Signature
ActionNoise dm_control/suite/wrappers/action_noise.py:L25-70 action_noise.Wrapper(env, scale=0.01)
ActionScale dm_control/suite/wrappers/action_scale.py:L27-103 action_scale.Wrapper(env, minimum, maximum)
Pixels dm_control/suite/wrappers/pixels.py:L26-115 pixels.Wrapper(env, pixels_only=True, render_kwargs=None, observation_key='pixels')
MujocoProfiler dm_control/suite/wrappers/mujoco_profiling.py:L25-107 mujoco_profiling.Wrapper(env, observation_key='step_timing')

Import:

from dm_control.suite.wrappers import action_noise
from dm_control.suite.wrappers import action_scale
from dm_control.suite.wrappers import pixels
from dm_control.suite.wrappers import mujoco_profiling

I/O Contract

ActionNoise

Input/Output Name Type Description
Input env dm_env.Environment Base environment. Must have finite action bounds.
Input scale float Noise standard deviation as fraction of action range. Default 0.01.
Output (step) TimeStep dm_env.TimeStep Same as inner env, but action was perturbed before forwarding.

ActionScale

Input/Output Name Type Description
Input env dm_env.Environment Base environment. action_spec() must return a single BoundedArray with finite bounds.
Input minimum scalar or array New lower bound for actions. Must be finite and broadcastable.
Input maximum scalar or array New upper bound for actions. Must be finite and broadcastable.
Output (action_spec) BoundedArraySpec dm_env.specs.BoundedArraySpec Spec with updated minimum and maximum.

Pixels

Input/Output Name Type Description
Input env dm_env.Environment Base environment with a physics.render() method.
Input pixels_only bool If True, discard state observations. Default True.
Input render_kwargs dict or None Keyword arguments for physics.render() (e.g. {"width": 84, "height": 84, "camera_id": 0}).
Input observation_key str Key name for pixel observation. Default "pixels".
Output (observation) OrderedDict OrderedDict[str, ndarray] Contains "pixels" key (and optionally original observations).

MujocoProfiler

Input/Output Name Type Description
Input env dm_env.Environment Base environment with MuJoCo physics.
Input observation_key str Key name for profiling data. Default "step_timing".
Output (observation) OrderedDict OrderedDict[str, ndarray] Original observations plus a "step_timing" array of shape (2,) containing [duration, count].

Usage Examples

Add action noise for robustness testing:

from dm_control import suite
from dm_control.suite.wrappers import action_noise

env = suite.load('cheetah', 'run')
env = action_noise.Wrapper(env, scale=0.05)

time_step = env.reset()
action = env.action_spec().minimum  # actions will be perturbed by noise
time_step = env.step(action)

Rescale actions to [-1, 1] for a normalised policy output:

from dm_control import suite
from dm_control.suite.wrappers import action_scale

env = suite.load('walker', 'walk')
env = action_scale.Wrapper(env, minimum=-1.0, maximum=1.0)

print(env.action_spec().minimum)  # [-1. -1. -1. -1. -1. -1.]
print(env.action_spec().maximum)  # [ 1.  1.  1.  1.  1.  1.]

Add pixel observations for vision-based RL:

from dm_control import suite
from dm_control.suite.wrappers import pixels

env = suite.load('cartpole', 'balance')
env = pixels.Wrapper(env, pixels_only=False,
                     render_kwargs={'width': 84, 'height': 84, 'camera_id': 0})

time_step = env.reset()
print(time_step.observation['pixels'].shape)  # (84, 84, 3)
# Original state observations are also available
print(list(time_step.observation.keys()))     # ['position', 'velocity', 'pixels']

Stack multiple wrappers:

from dm_control import suite
from dm_control.suite.wrappers import action_noise, action_scale, pixels

env = suite.load('finger', 'spin')
env = action_scale.Wrapper(env, minimum=-1.0, maximum=1.0)
env = action_noise.Wrapper(env, scale=0.02)
env = pixels.Wrapper(env, pixels_only=True,
                     render_kwargs={'width': 64, 'height': 64})

# Agent sees pixel observations, outputs in [-1, 1], and noise is applied
time_step = env.reset()
print(time_step.observation['pixels'].shape)  # (64, 64, 3)

Profile MuJoCo step times:

from dm_control import suite
from dm_control.suite.wrappers import mujoco_profiling
import numpy as np

env = suite.load('humanoid', 'walk')
env = mujoco_profiling.Wrapper(env)

time_step = env.reset()
action = np.zeros(env.action_spec().shape)
time_step = env.step(action)

timing = time_step.observation['step_timing']
print(f"Cumulative step duration: {timing[0]:.6f} s")
print(f"Step timer call count:    {int(timing[1])}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment