Implementation:Google deepmind Dm control Suite Walker

Knowledge Sources	Google_deepmind_Dm_control
Domains	Reinforcement Learning, Locomotion, Physics Simulation
Last Updated	2026-02-15 04:00 GMT

Overview

The Planar Walker domain implements a two-legged bipedal walker with three benchmarking tasks of increasing difficulty: standing, walking, and running.

Description

The Walker module defines a planar bipedal locomotion environment where a two-legged agent must balance and move horizontally. Three task variants are registered in the benchmarking suite: stand (balance upright with zero velocity), walk (achieve 1 m/s horizontal velocity), and run (achieve 8 m/s horizontal velocity). All tasks use a control timestep of 0.025 seconds and a time limit of 25 seconds.

The Physics class extends mujoco.Physics with methods for computing torso uprightness (projection of the torso's z-axis onto the world z-axis), torso height, horizontal velocity from the subtree linear velocity sensor, and planar orientations of all bodies (xx and xz rotation matrix components). The PlanarWalker task class randomizes limited and rotational joints at episode start using the standard randomizer utility.

The reward function combines a standing reward and a movement reward. The standing reward is a weighted combination of height tolerance (torso height above 1.2m, weighted 3/4) and uprightness ((1 + upright) / 2, weighted 1/4). For locomotion tasks, this standing reward is multiplied by a velocity-dependent term that uses a linear sigmoid tolerance on horizontal velocity, producing a composite reward of stand_reward * (5 * move_reward + 1) / 6.

Usage

Use this module for core bipedal locomotion benchmarking. The three difficulty levels (stand, walk, run) provide a natural curriculum for evaluating control algorithms on increasingly challenging locomotion objectives.

Code Reference

Source Location

Repository: Google_deepmind_Dm_control
File: dm_control/suite/walker.py
Lines: 1-154

Signature

def get_model_and_assets():
    """Returns a tuple containing the model XML string and a dict of assets."""

def stand(time_limit=_DEFAULT_TIME_LIMIT, random=None, environment_kwargs=None):
    """Returns the Stand task."""

def walk(time_limit=_DEFAULT_TIME_LIMIT, random=None, environment_kwargs=None):
    """Returns the Walk task."""

def run(time_limit=_DEFAULT_TIME_LIMIT, random=None, environment_kwargs=None):
    """Returns the Run task."""

class Physics(mujoco.Physics):
    def torso_upright(self): ...
    def torso_height(self): ...
    def horizontal_velocity(self): ...
    def orientations(self): ...

class PlanarWalker(base.Task):
    def __init__(self, move_speed, random=None): ...
    def initialize_episode(self, physics): ...
    def get_observation(self, physics): ...
    def get_reward(self, physics): ...

Import

from dm_control.suite import walker

I/O Contract

Inputs

Name	Type	Required	Description
move_speed	float	Yes	Target horizontal velocity; 0 for standing, 1 for walking, 8 for running
time_limit	float	No	Episode time limit in seconds (default 25)
random	int/np.random.RandomState/None	No	Random seed or state for reproducibility
environment_kwargs	dict	No	Additional keyword arguments passed to the Environment constructor

Outputs

Name	Type	Description
environment	control.Environment	A dm_control Environment instance with the walker task
observation	OrderedDict	Contains orientations (planar body orientations), height (torso height), velocity (joint velocities)
reward	float	Composite of standing and movement rewards, range [0, 1]

Usage Examples

from dm_control import suite

# Load the standing task
env = suite.load('walker', 'stand')

# Load the walking task
env = suite.load('walker', 'walk')

# Load the running task
env = suite.load('walker', 'run')

# Step through the environment
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)
    obs = time_step.observation
    print("Torso height:", obs['height'])
    print("Reward:", time_step.reward)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment