Principle:Google deepmind Dm control Observable Configuration

Attribute	Value
Principle	Observable Configuration
Workflow	Composer_Environment_Building
Domain	Reinforcement_Learning, Observation
Source	dm_control
Last Updated	2026-02-15 00:00 GMT

Overview

Observable configuration is a declarative system for specifying what an agent can perceive, at what rate observations are sampled, how they are buffered, delayed, aggregated, and optionally corrupted.

Description

In real-world robotics, sensors do not all operate at the same frequency, observations arrive with latency, and sensor readings are noisy. A faithful simulation of these phenomena requires more than simply reading physics state at each control step. The Observable Configuration principle provides a multi-rate observation pipeline with the following capabilities:

Update interval: Each observable can specify how many simulation steps pass between successive readings. A proprioceptive sensor might update every step, while a camera might update every 5 steps.
Buffer size: Observations are stored in a fixed-size ring buffer. With a buffer of size k, the agent receives the most recent k readings (or an aggregation of them).
Delay: An observable can be configured with a non-negative integer delay, measured in simulation steps. The observation returned to the agent reflects the state from delay steps ago, modeling real sensor latency.
Aggregation: When the buffer contains multiple readings, an aggregator function (min, max, mean, median, sum, or a custom callable) can reduce the buffer to a single value before it is returned.
Corruption: An optional corruptor callable can modify each raw observation before it enters the buffer, modeling sensor noise or bias.
Enable/disable: Each observable has an enabled flag. Only enabled observables are included in the agent's observation dict and consume computation.

The observable objects themselves are abstract: they declare what to observe (a named MuJoCo feature, an MJCF element binding, a camera render, or an arbitrary callable) and how to configure it. A separate Updater object manages the actual buffering, scheduling, and retrieval at runtime.

Observable types span two families:

Name-based (MujocoFeature, MujocoCamera): reference physics data by string name. These are convenient but fragile under model changes.
MJCF-based (MJCFFeature, MJCFCamera): reference physics data by mjcf.Element object. These automatically track element renames and are the preferred approach in Composer environments.

A Generic observable wraps any callable that takes a Physics and returns a value.

Usage

Use Observable Configuration when you need to:

Expose sensor readings: Create MJCFFeature observables for joint positions, velocities, or contact forces bound to specific MJCF elements.
Add camera observations: Create MJCFCamera observables for RGB, depth, or segmentation images from cameras defined in the MJCF model.
Model realistic sensing: Configure update_interval, delay, and corruptor to simulate multi-rate, delayed, and noisy sensors.
Aggregate temporal data: Set buffer_size > 1 with an aggregator to provide the agent with temporal summaries (e.g., mean velocity over the last 5 readings).
Compute derived quantities: Use Generic observables for task-specific computations such as relative distances or goal indicators.

Theoretical Basis

The observation pipeline can be described as a discrete-time signal processing chain:

Raw physics state
    |
    v
[Observable._callable(physics)]   -- sample the raw observation
    |
    v
[corruptor(raw_obs, random_state)] -- optional noise injection
    |
    v
[Ring Buffer of size buffer_size]  -- store with timestamp
    |
    v
[delay filter]                     -- shift read pointer back by delay steps
    |
    v
[aggregator(buffer_contents)]      -- reduce buffer (optional)
    |
    v
Agent observation

The Updater maintains a schedule for each enabled observable. At each physics substep, it checks whether an observable is due for an update (based on its update_interval) and, if so, calls the observation callable and inserts the result into the buffer with the appropriate delay tag. At the end of a control step, the updater reads from each buffer (respecting delays) and optionally applies the aggregator.

The update schedule is optimized: the updater pre-computes which updates will actually be visible at the next observation read and drops buffer entries that would be overwritten before being observed.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment