Principle:Google deepmind Dm control Environment Wrapping
| Metadata | Value |
|---|---|
| Principle | Environment Wrapping |
| Domain | Reinforcement_Learning, Software_Engineering |
| Source | dm_control |
| Workflow | Control_Suite_RL_Training |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Environment wrapping is a compositional design pattern in which a decorator object intercepts and transforms the inputs, outputs, or behaviour of an underlying environment without modifying its source code.
Description
Reinforcement learning experiments frequently require augmenting a base environment with additional functionality:
- Action perturbation -- injecting Gaussian noise into actions to test robustness or encourage exploration.
- Action rescaling -- remapping the agent's output range (e.g. [-1, 1]) to the environment's native actuator range.
- Observation augmentation -- adding pixel renderings alongside (or instead of) the original state-based observations.
- Profiling and diagnostics -- recording per-step timing data from the physics engine.
The wrapper pattern (a specialisation of the Decorator pattern from object-oriented design) achieves this by implementing the same dm_env.Environment interface as the base environment. Each wrapper:
- Stores a reference to the inner environment.
- Overrides one or more methods (
step,reset,action_spec,observation_spec) to inject its transformations. - Delegates all other method calls and attribute accesses to the inner environment via
__getattr__.
Because wrappers share the same interface, they compose naturally: multiple wrappers can be stacked (e.g. action noise on top of action scaling on top of pixel observations) and the agent sees a single unified environment.
Usage
Apply this principle whenever:
- You want to change what the agent sees or does without modifying the environment or task source code.
- You need to combine several independent transformations in a configurable pipeline.
- You are building a benchmark that compares agents under different observation modalities (state vs. pixels).
Theoretical Basis
The wrapper pattern follows the Decorator structural pattern:
class Wrapper(Environment):
def __init__(self, env, **config):
self._env = env
def step(self, action):
transformed_action = self.transform_action(action)
time_step = self._env.step(transformed_action)
return self.transform_observation(time_step)
def reset(self):
return self.transform_observation(self._env.reset())
def action_spec(self):
return self.maybe_modified_action_spec()
def observation_spec(self):
return self.maybe_modified_observation_spec()
def __getattr__(self, name):
return getattr(self._env, name) // transparent delegation
Wrappers compose via nesting:
base_env = load("cartpole", "balance")
noisy_env = ActionNoise(base_env, scale=0.01)
pixel_env = Pixels(noisy_env, pixels_only=False)
// pixel_env.step(a) -> Pixels.step -> ActionNoise.step -> base_env.step
The __getattr__ fallback ensures that properties like physics, task, and control_timestep() remain accessible through any number of wrapper layers.