Principle:Google deepmind Dm control Game Rules Configuration
| Metadata | |
|---|---|
| Knowledge Sources | dm_control |
| Domains | Multi-Agent Reinforcement Learning, Game Design |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Game rules configuration is the principle of encoding scoring logic, reward distribution, episode termination conditions, and ball-reset mechanics into a task object so that multi-agent competition follows well-defined rules.
Description
A multi-agent competitive environment needs a formal specification of what counts as winning, how progress is rewarded, and when an episode ends. Game rules configuration addresses:
- Scoring detection -- The task monitors arena sensors to detect when a goal has been scored and by which team.
- Per-agent reward assignment -- Upon a scoring event, every agent receives a signed scalar reward:
+1for the scoring team and-1for the conceding team. When no goal is scored, all rewards are0. - Episode termination -- The task decides whether to end the episode on the first goal (single-turn) or to reinitialise positions and continue play until a time limit (multi-turn).
- Out-of-bounds handling -- When the ball leaves the pitch, a throw-in mechanic repositions it slightly inward and resets its velocity.
These rules are encoded declaratively in a task object that the environment loop queries at every timestep.
Usage
Game rules configuration is needed whenever:
- A researcher wants to switch between episodic (terminate-on-goal) and continuing (multi-turn) training regimes.
- The reward function needs to be inspected or replaced.
- Custom termination criteria (e.g. maximum score difference) are desired.
Theoretical Basis
The reward and termination logic implement a team zero-sum structure. Let be the goal event at timestep t. The per-player reward for player p on team is:
r_p(t) =
+1 if G_t = tau_p (player's team scored)
-1 if G_t != null and G_t != tau_p (opponent scored)
0 if G_t = null (no goal)
The discount factor follows standard RL conventions:
Single-turn (Task):
gamma(t) = 0 if G_t != null (episode ends)
gamma(t) = 1 otherwise
Multi-turn (MultiturnTask):
gamma(t) = 1 always (episode never terminates on a goal)
In the multi-turn variant, positions are reinitialised after every goal and ball entity trackers are reset, but the episode clock continues.