Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Google deepmind Dm control Suite Humanoid

From Leeroopedia
Metadata Value
Implementation Suite Humanoid
Domain Reinforcement_Learning, Control
Source Google_deepmind_Dm_control
Last Updated 2026-02-15 04:00 GMT

Overview

Concrete tool for controlling a 3D humanoid to stand, walk, or run provided by the dm_control Control Suite.

Description

The Humanoid domain models a full 3D humanoid body with 21 degrees of freedom and multiple actuated joints. The Physics subclass provides methods for querying the torso uprightness (z-axis projection), head height, center-of-mass position and velocity, torso vertical orientation, joint angles (excluding the 7 root DOFs of the free joint), and end effector (hands and feet) positions in the egocentric torso frame.

Four tasks are registered: three benchmarking tasks (stand, walk, run) and one extra task (run_pure_state). All use the Humanoid task class parameterized by move_speed and pure_state. The stand task sets move_speed=0 and rewards the agent for keeping the head above 1.4 m, maintaining torso uprightness, using small controls, and not moving horizontally. The walk and run tasks set move_speed to 1 and 10 m/s respectively, rewarding horizontal locomotion at or above the target speed. The run_pure_state variant provides raw MuJoCo positions and velocities as observations instead of egocentric features.

Episode initialization randomizes all limited and rotational joints to a collision-free configuration. All tasks use a default time limit of 25 seconds and a control timestep of 0.025 seconds.

Usage

Use this implementation for high-dimensional 3D locomotion benchmarks. Load via suite.load(domain_name='humanoid', task_name='stand') or the walk/run variants.

Code Reference

Source Location

Signature

# Task factory functions
def stand(time_limit=25, random=None, environment_kwargs=None)
def walk(time_limit=25, random=None, environment_kwargs=None)
def run(time_limit=25, random=None, environment_kwargs=None)
def run_pure_state(time_limit=25, random=None, environment_kwargs=None)

# Physics subclass
class Physics(mujoco.Physics):
    def torso_upright(self)                # z-axis projection of torso
    def head_height(self)                  # height of the head
    def center_of_mass_position(self)      # CoM position
    def center_of_mass_velocity(self)      # CoM velocity from sensor
    def torso_vertical_orientation(self)   # z-projection of torso orientation
    def joint_angles(self)                 # joint positions sans root DOFs
    def extremities(self)                  # hand/foot positions in torso frame

# Task class
class Humanoid(base.Task):
    def __init__(self, move_speed, pure_state, random=None)
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='humanoid', task_name='walk')

I/O Contract

Inputs

Name Type Required Description
time_limit float No Maximum episode duration in seconds (default 25).
random int, numpy.random.RandomState, or None No Random seed or RNG instance for reproducibility.
environment_kwargs dict or None No Additional keyword arguments forwarded to the Environment constructor.

Outputs

Name Type Description
environment dm_control.rl.control.Environment A fully initialised environment conforming to the dm_env.Environment interface.

Observations (egocentric mode)

Key Type Description
joint_angles numpy array Joint positions excluding the 7 root free-joint DOFs.
head_height float Height of the head above the ground.
extremities numpy array (12,) Left/right hand and foot positions in torso frame.
torso_vertical numpy array (3,) Z-projection of torso orientation matrix.
com_velocity numpy array (3,) Center-of-mass velocity from sensor.
velocity numpy array Joint velocities.

Observations (pure state mode)

Key Type Description
position numpy array Full generalized positions.
velocity numpy array Full generalized velocities.

Usage Examples

from dm_control import suite

# Load the walk task
env = suite.load(domain_name='humanoid', task_name='walk')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)

# Load the run task
env_run = suite.load(domain_name='humanoid', task_name='run')

# Load with pure state observations
env_pure = suite.load(domain_name='humanoid', task_name='run_pure_state')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment