Implementation:Google deepmind Dm control Suite Humanoid
| Metadata | Value |
|---|---|
| Implementation | Suite Humanoid |
| Domain | Reinforcement_Learning, Control |
| Source | Google_deepmind_Dm_control |
| Last Updated | 2026-02-15 04:00 GMT |
Overview
Concrete tool for controlling a 3D humanoid to stand, walk, or run provided by the dm_control Control Suite.
Description
The Humanoid domain models a full 3D humanoid body with 21 degrees of freedom and multiple actuated joints. The Physics subclass provides methods for querying the torso uprightness (z-axis projection), head height, center-of-mass position and velocity, torso vertical orientation, joint angles (excluding the 7 root DOFs of the free joint), and end effector (hands and feet) positions in the egocentric torso frame.
Four tasks are registered: three benchmarking tasks (stand, walk, run) and one extra task (run_pure_state). All use the Humanoid task class parameterized by move_speed and pure_state. The stand task sets move_speed=0 and rewards the agent for keeping the head above 1.4 m, maintaining torso uprightness, using small controls, and not moving horizontally. The walk and run tasks set move_speed to 1 and 10 m/s respectively, rewarding horizontal locomotion at or above the target speed. The run_pure_state variant provides raw MuJoCo positions and velocities as observations instead of egocentric features.
Episode initialization randomizes all limited and rotational joints to a collision-free configuration. All tasks use a default time limit of 25 seconds and a control timestep of 0.025 seconds.
Usage
Use this implementation for high-dimensional 3D locomotion benchmarks. Load via suite.load(domain_name='humanoid', task_name='stand') or the walk/run variants.
Code Reference
Source Location
- Repository: Google_deepmind_Dm_control
- File: dm_control/suite/humanoid.py
- Lines: 1-207
Signature
# Task factory functions
def stand(time_limit=25, random=None, environment_kwargs=None)
def walk(time_limit=25, random=None, environment_kwargs=None)
def run(time_limit=25, random=None, environment_kwargs=None)
def run_pure_state(time_limit=25, random=None, environment_kwargs=None)
# Physics subclass
class Physics(mujoco.Physics):
def torso_upright(self) # z-axis projection of torso
def head_height(self) # height of the head
def center_of_mass_position(self) # CoM position
def center_of_mass_velocity(self) # CoM velocity from sensor
def torso_vertical_orientation(self) # z-projection of torso orientation
def joint_angles(self) # joint positions sans root DOFs
def extremities(self) # hand/foot positions in torso frame
# Task class
class Humanoid(base.Task):
def __init__(self, move_speed, pure_state, random=None)
def initialize_episode(self, physics)
def get_observation(self, physics)
def get_reward(self, physics)
Import
from dm_control import suite
env = suite.load(domain_name='humanoid', task_name='walk')
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
time_limit |
float | No | Maximum episode duration in seconds (default 25). |
random |
int, numpy.random.RandomState, or None | No | Random seed or RNG instance for reproducibility. |
environment_kwargs |
dict or None | No | Additional keyword arguments forwarded to the Environment constructor.
|
Outputs
| Name | Type | Description |
|---|---|---|
| environment | dm_control.rl.control.Environment |
A fully initialised environment conforming to the dm_env.Environment interface.
|
Observations (egocentric mode)
| Key | Type | Description |
|---|---|---|
joint_angles |
numpy array | Joint positions excluding the 7 root free-joint DOFs. |
head_height |
float | Height of the head above the ground. |
extremities |
numpy array (12,) | Left/right hand and foot positions in torso frame. |
torso_vertical |
numpy array (3,) | Z-projection of torso orientation matrix. |
com_velocity |
numpy array (3,) | Center-of-mass velocity from sensor. |
velocity |
numpy array | Joint velocities. |
Observations (pure state mode)
| Key | Type | Description |
|---|---|---|
position |
numpy array | Full generalized positions. |
velocity |
numpy array | Full generalized velocities. |
Usage Examples
from dm_control import suite
# Load the walk task
env = suite.load(domain_name='humanoid', task_name='walk')
# Run an episode
time_step = env.reset()
while not time_step.last():
action = env.action_spec().generate_value()
time_step = env.step(action)
# Load the run task
env_run = suite.load(domain_name='humanoid', task_name='run')
# Load with pure state observations
env_pure = suite.load(domain_name='humanoid', task_name='run_pure_state')