Principle:Google deepmind Dm control Per Player Observation Design

Metadata
Knowledge Sources	dm_control
Domains	Multi-Agent Reinforcement Learning, Observation Design
Last Updated	2026-02-15 00:00 GMT

Overview

Per-player observation design is the principle of constructing an independent, egocentric observation vector for each agent in a multi-agent environment so that every agent perceives the world from its own reference frame.

Description

In a multi-agent setting, each agent must receive observations that are both informative and appropriately scoped. Per-player observation design specifies:

Proprioception -- Each agent observes its own joint angles, velocities, and kinematic sensor readings (accelerometer, gyroscope, velocimeter). The previous action is also included.
Egocentric ball state -- The ball's position, linear velocity, and angular velocity are expressed in the observing agent's body frame. This removes the need for the policy to learn coordinate transforms.
Egocentric teammate and opponent state -- The position, linear velocity, orientation, and end-effector positions of every other player are transformed into the observing player's reference frame. Teammates and opponents are given distinct prefixed names (e.g. teammate_0, opponent_1).
Arena landmarks -- Goal posts and field corners are provided as egocentric vectors so the agent can orient itself on the pitch.
Game statistics -- Derived quantities such as velocity toward ball, closest-teammate velocity toward ball, forward velocity, ball velocity toward the opponent's goal, average teammate distance, and scoring indicators are included for potential reward shaping.
Interception events -- Optional binary indicators for ball reception and opponent interception events at multiple distance thresholds (5m, 10m, 15m).

The observation system is designed to be modular: different adders can be composed to include or exclude categories of observations.

Usage

Per-player observation design is relevant when:

Training decentralised policies where each agent has a private observation.
Designing reward shaping signals from observable statistics.
Extending the observation space with custom features (e.g. communication channels).

Theoretical Basis

Egocentric observations can be formalised through MuJoCo frame sensors. For a quantity q in the world frame and a player with root body position p and orientation R, the egocentric observation is:

q_ego = R^T * (q_world - p)

MuJoCo implements this via framepos, framelinvel, frameangvel, and framexaxis/frameyaxis/framezaxis sensors with reftype="body" and refname set to the observing player's root body. This hardware-level sensor approach avoids manual matrix multiplication and is computed natively within the physics engine at each timestep.

The observation for player i in a game with N total players is:

o_i = concat(
    proprio_i,                        # joint angles, velocities, prev action
    ball_ego_pos_i, ball_ego_vel_i,   # ball in player i's frame
    [teammate_j_ego for j in teammates(i)],
    [opponent_k_ego for k in opponents(i)],
    arena_landmarks_ego_i,            # goals and field corners
    stats_i                           # derived game statistics
)

Related Pages

Implementation:Google_deepmind_Dm_control_Soccer_Observables

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment