Principle:Google deepmind Dm control Game Rules Configuration

Metadata
Knowledge Sources	dm_control
Domains	Multi-Agent Reinforcement Learning, Game Design
Last Updated	2026-02-15 00:00 GMT

Overview

Game rules configuration is the principle of encoding scoring logic, reward distribution, episode termination conditions, and ball-reset mechanics into a task object so that multi-agent competition follows well-defined rules.

Description

A multi-agent competitive environment needs a formal specification of what counts as winning, how progress is rewarded, and when an episode ends. Game rules configuration addresses:

Scoring detection -- The task monitors arena sensors to detect when a goal has been scored and by which team.
Per-agent reward assignment -- Upon a scoring event, every agent receives a signed scalar reward: +1 for the scoring team and -1 for the conceding team. When no goal is scored, all rewards are 0.
Episode termination -- The task decides whether to end the episode on the first goal (single-turn) or to reinitialise positions and continue play until a time limit (multi-turn).
Out-of-bounds handling -- When the ball leaves the pitch, a throw-in mechanic repositions it slightly inward and resets its velocity.

These rules are encoded declaratively in a task object that the environment loop queries at every timestep.

Usage

Game rules configuration is needed whenever:

A researcher wants to switch between episodic (terminate-on-goal) and continuing (multi-turn) training regimes.
The reward function needs to be inspected or replaced.
Custom termination criteria (e.g. maximum score difference) are desired.

Theoretical Basis

The reward and termination logic implement a team zero-sum structure. Let $G_{t} \in {HOME, AWAY, \emptyset}$ be the goal event at timestep t. The per-player reward for player p on team $τ_{p}$ is:

r_p(t) =
  +1   if G_t = tau_p          (player's team scored)
  -1   if G_t != null and G_t != tau_p  (opponent scored)
   0   if G_t = null            (no goal)

The discount factor follows standard RL conventions:

Single-turn (Task):
  gamma(t) = 0  if G_t != null   (episode ends)
  gamma(t) = 1  otherwise

Multi-turn (MultiturnTask):
  gamma(t) = 1  always           (episode never terminates on a goal)

In the multi-turn variant, positions are reinitialised after every goal and ball entity trackers are reset, but the episode clock continues.

Related Pages

Implementation:Google_deepmind_Dm_control_Soccer_Task

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment