Implementation:Google deepmind Dm control Suite Walker
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement Learning, Locomotion, Physics Simulation |
| Last Updated | 2026-02-15 04:00 GMT |
Overview
The Planar Walker domain implements a two-legged bipedal walker with three benchmarking tasks of increasing difficulty: standing, walking, and running.
Description
The Walker module defines a planar bipedal locomotion environment where a two-legged agent must balance and move horizontally. Three task variants are registered in the benchmarking suite: stand (balance upright with zero velocity), walk (achieve 1 m/s horizontal velocity), and run (achieve 8 m/s horizontal velocity). All tasks use a control timestep of 0.025 seconds and a time limit of 25 seconds.
The Physics class extends mujoco.Physics with methods for computing torso uprightness (projection of the torso's z-axis onto the world z-axis), torso height, horizontal velocity from the subtree linear velocity sensor, and planar orientations of all bodies (xx and xz rotation matrix components). The PlanarWalker task class randomizes limited and rotational joints at episode start using the standard randomizer utility.
The reward function combines a standing reward and a movement reward. The standing reward is a weighted combination of height tolerance (torso height above 1.2m, weighted 3/4) and uprightness ((1 + upright) / 2, weighted 1/4). For locomotion tasks, this standing reward is multiplied by a velocity-dependent term that uses a linear sigmoid tolerance on horizontal velocity, producing a composite reward of stand_reward * (5 * move_reward + 1) / 6.
Usage
Use this module for core bipedal locomotion benchmarking. The three difficulty levels (stand, walk, run) provide a natural curriculum for evaluating control algorithms on increasingly challenging locomotion objectives.
Code Reference
Source Location
- Repository: Google_deepmind_Dm_control
- File: dm_control/suite/walker.py
- Lines: 1-154
Signature
def get_model_and_assets():
"""Returns a tuple containing the model XML string and a dict of assets."""
def stand(time_limit=_DEFAULT_TIME_LIMIT, random=None, environment_kwargs=None):
"""Returns the Stand task."""
def walk(time_limit=_DEFAULT_TIME_LIMIT, random=None, environment_kwargs=None):
"""Returns the Walk task."""
def run(time_limit=_DEFAULT_TIME_LIMIT, random=None, environment_kwargs=None):
"""Returns the Run task."""
class Physics(mujoco.Physics):
def torso_upright(self): ...
def torso_height(self): ...
def horizontal_velocity(self): ...
def orientations(self): ...
class PlanarWalker(base.Task):
def __init__(self, move_speed, random=None): ...
def initialize_episode(self, physics): ...
def get_observation(self, physics): ...
def get_reward(self, physics): ...
Import
from dm_control.suite import walker
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| move_speed | float | Yes | Target horizontal velocity; 0 for standing, 1 for walking, 8 for running |
| time_limit | float | No | Episode time limit in seconds (default 25) |
| random | int/np.random.RandomState/None | No | Random seed or state for reproducibility |
| environment_kwargs | dict | No | Additional keyword arguments passed to the Environment constructor |
Outputs
| Name | Type | Description |
|---|---|---|
| environment | control.Environment | A dm_control Environment instance with the walker task |
| observation | OrderedDict | Contains orientations (planar body orientations), height (torso height), velocity (joint velocities) |
| reward | float | Composite of standing and movement rewards, range [0, 1] |
Usage Examples
from dm_control import suite
# Load the standing task
env = suite.load('walker', 'stand')
# Load the walking task
env = suite.load('walker', 'walk')
# Load the running task
env = suite.load('walker', 'run')
# Step through the environment
time_step = env.reset()
while not time_step.last():
action = env.action_spec().generate_value()
time_step = env.step(action)
obs = time_step.observation
print("Torso height:", obs['height'])
print("Reward:", time_step.reward)