Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Google deepmind Dm control Suite Walker

From Leeroopedia
Knowledge Sources
Domains Reinforcement Learning, Locomotion, Physics Simulation
Last Updated 2026-02-15 04:00 GMT

Overview

The Planar Walker domain implements a two-legged bipedal walker with three benchmarking tasks of increasing difficulty: standing, walking, and running.

Description

The Walker module defines a planar bipedal locomotion environment where a two-legged agent must balance and move horizontally. Three task variants are registered in the benchmarking suite: stand (balance upright with zero velocity), walk (achieve 1 m/s horizontal velocity), and run (achieve 8 m/s horizontal velocity). All tasks use a control timestep of 0.025 seconds and a time limit of 25 seconds.

The Physics class extends mujoco.Physics with methods for computing torso uprightness (projection of the torso's z-axis onto the world z-axis), torso height, horizontal velocity from the subtree linear velocity sensor, and planar orientations of all bodies (xx and xz rotation matrix components). The PlanarWalker task class randomizes limited and rotational joints at episode start using the standard randomizer utility.

The reward function combines a standing reward and a movement reward. The standing reward is a weighted combination of height tolerance (torso height above 1.2m, weighted 3/4) and uprightness ((1 + upright) / 2, weighted 1/4). For locomotion tasks, this standing reward is multiplied by a velocity-dependent term that uses a linear sigmoid tolerance on horizontal velocity, producing a composite reward of stand_reward * (5 * move_reward + 1) / 6.

Usage

Use this module for core bipedal locomotion benchmarking. The three difficulty levels (stand, walk, run) provide a natural curriculum for evaluating control algorithms on increasingly challenging locomotion objectives.

Code Reference

Source Location

Signature

def get_model_and_assets():
    """Returns a tuple containing the model XML string and a dict of assets."""

def stand(time_limit=_DEFAULT_TIME_LIMIT, random=None, environment_kwargs=None):
    """Returns the Stand task."""

def walk(time_limit=_DEFAULT_TIME_LIMIT, random=None, environment_kwargs=None):
    """Returns the Walk task."""

def run(time_limit=_DEFAULT_TIME_LIMIT, random=None, environment_kwargs=None):
    """Returns the Run task."""

class Physics(mujoco.Physics):
    def torso_upright(self): ...
    def torso_height(self): ...
    def horizontal_velocity(self): ...
    def orientations(self): ...

class PlanarWalker(base.Task):
    def __init__(self, move_speed, random=None): ...
    def initialize_episode(self, physics): ...
    def get_observation(self, physics): ...
    def get_reward(self, physics): ...

Import

from dm_control.suite import walker

I/O Contract

Inputs

Name Type Required Description
move_speed float Yes Target horizontal velocity; 0 for standing, 1 for walking, 8 for running
time_limit float No Episode time limit in seconds (default 25)
random int/np.random.RandomState/None No Random seed or state for reproducibility
environment_kwargs dict No Additional keyword arguments passed to the Environment constructor

Outputs

Name Type Description
environment control.Environment A dm_control Environment instance with the walker task
observation OrderedDict Contains orientations (planar body orientations), height (torso height), velocity (joint velocities)
reward float Composite of standing and movement rewards, range [0, 1]

Usage Examples

from dm_control import suite

# Load the standing task
env = suite.load('walker', 'stand')

# Load the walking task
env = suite.load('walker', 'walk')

# Load the running task
env = suite.load('walker', 'run')

# Step through the environment
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)
    obs = time_step.observation
    print("Torso height:", obs['height'])
    print("Reward:", time_step.reward)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment