Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Google deepmind Dm control Per Player Observation Design

From Leeroopedia
Metadata
Knowledge Sources dm_control
Domains Multi-Agent Reinforcement Learning, Observation Design
Last Updated 2026-02-15 00:00 GMT

Overview

Per-player observation design is the principle of constructing an independent, egocentric observation vector for each agent in a multi-agent environment so that every agent perceives the world from its own reference frame.

Description

In a multi-agent setting, each agent must receive observations that are both informative and appropriately scoped. Per-player observation design specifies:

  • Proprioception -- Each agent observes its own joint angles, velocities, and kinematic sensor readings (accelerometer, gyroscope, velocimeter). The previous action is also included.
  • Egocentric ball state -- The ball's position, linear velocity, and angular velocity are expressed in the observing agent's body frame. This removes the need for the policy to learn coordinate transforms.
  • Egocentric teammate and opponent state -- The position, linear velocity, orientation, and end-effector positions of every other player are transformed into the observing player's reference frame. Teammates and opponents are given distinct prefixed names (e.g. teammate_0, opponent_1).
  • Arena landmarks -- Goal posts and field corners are provided as egocentric vectors so the agent can orient itself on the pitch.
  • Game statistics -- Derived quantities such as velocity toward ball, closest-teammate velocity toward ball, forward velocity, ball velocity toward the opponent's goal, average teammate distance, and scoring indicators are included for potential reward shaping.
  • Interception events -- Optional binary indicators for ball reception and opponent interception events at multiple distance thresholds (5m, 10m, 15m).

The observation system is designed to be modular: different adders can be composed to include or exclude categories of observations.

Usage

Per-player observation design is relevant when:

  • Training decentralised policies where each agent has a private observation.
  • Designing reward shaping signals from observable statistics.
  • Extending the observation space with custom features (e.g. communication channels).

Theoretical Basis

Egocentric observations can be formalised through MuJoCo frame sensors. For a quantity q in the world frame and a player with root body position p and orientation R, the egocentric observation is:

q_ego = R^T * (q_world - p)

MuJoCo implements this via framepos, framelinvel, frameangvel, and framexaxis/frameyaxis/framezaxis sensors with reftype="body" and refname set to the observing player's root body. This hardware-level sensor approach avoids manual matrix multiplication and is computed natively within the physics engine at each timestep.

The observation for player i in a game with N total players is:

o_i = concat(
    proprio_i,                        # joint angles, velocities, prev action
    ball_ego_pos_i, ball_ego_vel_i,   # ball in player i's frame
    [teammate_j_ego for j in teammates(i)],
    [opponent_k_ego for k in opponents(i)],
    arena_landmarks_ego_i,            # goals and field corners
    stats_i                           # derived game statistics
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment