Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Google deepmind Dm control Multi Agent Soccer Setup

From Leeroopedia
Knowledge Sources
Domains Multi_Agent_RL, Locomotion, Physics_Simulation
Last Updated 2026-02-15 12:00 GMT

Overview

End-to-end process for configuring and running the DeepMind multi-agent MuJoCo soccer environment with configurable team sizes, walker types, and game rules.

Description

This workflow covers the standard procedure for setting up the multi-agent soccer environment from dm_control. The environment simulates a physics-based soccer game where teams of agents must coordinate to score goals. It supports three walker types (BoxHead for simplified dynamics, Ant for quadruped locomotion, and Humanoid for realistic bipedal play), configurable team sizes (1v1 through 11v11), and customizable game rules (terminate on goal vs. continuous play, field box, walker contacts). Each player receives egocentric observations of the ball, teammates, opponents, and arena landmarks. The environment produces per-player rewards (+1 for team goal, -1 for conceded goal) and supports both sparse and shaped reward signals.

Usage

Execute this workflow when you need a multi-agent competitive RL environment for studying emergent coordination, team play, or multi-agent locomotion. The soccer environment is suitable for research on multi-agent reinforcement learning, competitive self-play, and hierarchical control policies.

Execution Steps

Step 1: Choose Walker Type and Team Configuration

Select the walker morphology and team size for the soccer game. BoxHead walkers provide simplified ball-shaped agents that can roll and jump, suitable for fast prototyping. Ant walkers use quadruped locomotion. Humanoid walkers provide realistic bipedal agents with jersey customization and motion capture initialization. Team sizes range from 1v1 to 11v11, with pitch size scaling automatically.

Key considerations:

  • BoxHead (WalkerType.BOXHEAD) is the default and fastest to simulate
  • Humanoid (WalkerType.HUMANOID) provides the most realistic play with jersey visuals
  • Ant (WalkerType.ANT) offers intermediate complexity
  • Pitch size scales with team_size * 2 (total players) for humanoid walkers
  • Each player is identified by team (HOME/AWAY) and a walker_id
  • Team colors are blue (HOME) and red (AWAY)

Step 2: Configure Game Rules

Set the game parameters including time limit, goal termination behavior, walker contact physics, and field boundary handling. When terminate_on_goal is True, episodes end on scoring; when False, players reset positions and play continues (MultiturnTask). The field box option confines the ball within an invisible boundary. Walker contacts can be disabled for training stability.

Key considerations:

  • time_limit defaults to 45 seconds per episode
  • terminate_on_goal=True creates single-goal episodes (Task class)
  • terminate_on_goal=False creates continuous play with position resets (MultiturnTask class)
  • disable_walker_contacts=True prevents physical collisions between players
  • enable_field_box=True constrains the ball within the pitch boundaries
  • keep_aspect_ratio maintains constant pitch proportions when scaling

Step 3: Load the Soccer Environment

Use the soccer.load() factory function to construct the complete environment. The loader creates the player walkers, builds the RandomizedPitch arena (with goals, field markings, and optional field box), instantiates the SoccerBall, configures per-player observables, and assembles everything into a composer.Environment. The resulting environment provides per-player action specs and observation specs.

Key considerations:

  • soccer.load() returns a composer.Environment with multi-agent support
  • Action specs are returned as a list (one per player)
  • Observations are returned as a list of dicts (one per player)
  • Rewards are returned as a list of floats (one per player)
  • The RandomizedPitch varies field dimensions between min_size and max_size
  • Humanoid games use regulation soccer ball and MINI_FOOTBALL_GOAL_SIZE

Step 4: Understand Per-Player Observations

Each player receives egocentric observations relative to their own position and orientation. Core observations include the ball position and velocity in the player's reference frame, teammate and opponent positions and velocities, goal positions, and field boundary distances. Additional statistics are available for deriving custom shaping rewards. Observation modules (CoreObservablesAdder, InterceptionObservablesAdder) configure which observations are active.

Key considerations:

  • All spatial observations are in the player's egocentric reference frame
  • Ball observations include relative position, velocity, and distance
  • Teammate/opponent observations are provided for each other player
  • Arena landmark observations encode goal and boundary positions
  • The observation spec varies with team size (more players = more observations)
  • Custom ObservablesAdder subclasses can extend the observation set

Step 5: Run the Multi-Agent Episode Loop

Execute the multi-agent interaction loop: reset the environment, then repeatedly collect actions from all players and step the environment. Each player's policy receives that player's observation and returns actions. The environment steps all players simultaneously and returns per-player rewards, a shared discount factor, and per-player observations.

Key considerations:

  • Actions must be provided as a list with one action array per player
  • Rewards are +1 (team scores), -1 (team concedes), or 0 (no goal)
  • Discount is shared across all players (single episode termination)
  • In MultiturnTask mode, scoring triggers position reset but not episode end
  • The SoccerBall tracks which player last touched it for attribution

Step 6: Visualize the Soccer Environment

Use the locomotion/soccer/explore.py script to launch an interactive visualization of the soccer environment. The script creates a default 2v2 configuration and opens the dm_control viewer. The smooth tracking camera follows all players and the ball simultaneously for recording and debugging.

Key considerations:

  • explore.py provides a ready-to-run visualization with default 2v2 BoxHead setup
  • The tracking camera (SoccerBallCamera) smoothly follows game action
  • Viewer supports pause, speed control, and manual body perturbation
  • Video recording can capture the tracking camera perspective

Execution Diagram

GitHub URL

Workflow Repository