Principle:Google deepmind Dm control Observable Configuration
| Attribute | Value |
|---|---|
| Principle | Observable Configuration |
| Workflow | Composer_Environment_Building |
| Domain | Reinforcement_Learning, Observation |
| Source | dm_control |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Observable configuration is a declarative system for specifying what an agent can perceive, at what rate observations are sampled, how they are buffered, delayed, aggregated, and optionally corrupted.
Description
In real-world robotics, sensors do not all operate at the same frequency, observations arrive with latency, and sensor readings are noisy. A faithful simulation of these phenomena requires more than simply reading physics state at each control step. The Observable Configuration principle provides a multi-rate observation pipeline with the following capabilities:
- Update interval: Each observable can specify how many simulation steps pass between successive readings. A proprioceptive sensor might update every step, while a camera might update every 5 steps.
- Buffer size: Observations are stored in a fixed-size ring buffer. With a buffer of size k, the agent receives the most recent k readings (or an aggregation of them).
- Delay: An observable can be configured with a non-negative integer delay, measured in simulation steps. The observation returned to the agent reflects the state from delay steps ago, modeling real sensor latency.
- Aggregation: When the buffer contains multiple readings, an aggregator function (min, max, mean, median, sum, or a custom callable) can reduce the buffer to a single value before it is returned.
- Corruption: An optional corruptor callable can modify each raw observation before it enters the buffer, modeling sensor noise or bias.
- Enable/disable: Each observable has an
enabledflag. Only enabled observables are included in the agent's observation dict and consume computation.
The observable objects themselves are abstract: they declare what to observe (a named MuJoCo feature, an MJCF element binding, a camera render, or an arbitrary callable) and how to configure it. A separate Updater object manages the actual buffering, scheduling, and retrieval at runtime.
Observable types span two families:
- Name-based (
MujocoFeature,MujocoCamera): reference physics data by string name. These are convenient but fragile under model changes. - MJCF-based (
MJCFFeature,MJCFCamera): reference physics data bymjcf.Elementobject. These automatically track element renames and are the preferred approach in Composer environments.
A Generic observable wraps any callable that takes a Physics and returns a value.
Usage
Use Observable Configuration when you need to:
- Expose sensor readings: Create
MJCFFeatureobservables for joint positions, velocities, or contact forces bound to specific MJCF elements. - Add camera observations: Create
MJCFCameraobservables for RGB, depth, or segmentation images from cameras defined in the MJCF model. - Model realistic sensing: Configure
update_interval,delay, andcorruptorto simulate multi-rate, delayed, and noisy sensors. - Aggregate temporal data: Set
buffer_size > 1with an aggregator to provide the agent with temporal summaries (e.g., mean velocity over the last 5 readings). - Compute derived quantities: Use
Genericobservables for task-specific computations such as relative distances or goal indicators.
Theoretical Basis
The observation pipeline can be described as a discrete-time signal processing chain:
Raw physics state
|
v
[Observable._callable(physics)] -- sample the raw observation
|
v
[corruptor(raw_obs, random_state)] -- optional noise injection
|
v
[Ring Buffer of size buffer_size] -- store with timestamp
|
v
[delay filter] -- shift read pointer back by delay steps
|
v
[aggregator(buffer_contents)] -- reduce buffer (optional)
|
v
Agent observation
The Updater maintains a schedule for each enabled observable. At each physics substep, it checks whether an observable is due for an update (based on its update_interval) and, if so, calls the observation callable and inserts the result into the buffer with the appropriate delay tag. At the end of a control step, the updater reads from each buffer (respecting delays) and optionally applies the aggregator.
The update schedule is optimized: the updater pre-computes which updates will actually be visible at the next observation read and drops buffer entries that would be overwritten before being observed.