Implementation:Farama Foundation Gymnasium HumanoidEnv V5
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, MuJoCo_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete implementation of the Humanoid v5 MuJoCo locomotion environment provided by Gymnasium.
Description
The Humanoid environment is based on Tassa, Erez, and Todorov's work in "Synthesis and stabilization of complex behaviors through online trajectory optimization". The 3D bipedal robot simulates a human with a torso (abdomen), a pair of legs and arms, and tendons connecting hips to knees. The legs each consist of three body parts (thigh, shin, foot) and the arms consist of two body parts (upper arm, forearm). The goal is to walk forward as fast as possible without falling. The reward function combines a healthy reward, a forward reward based on center-of-mass x-velocity, a control cost penalty, and a contact cost penalty. The observation includes qpos, qvel, cinert, cvel, qfrc_actuator, and cfrc_ext data. The episode terminates if the humanoid's z-coordinate falls outside [1.0, 2.0].
Usage
Use this environment for benchmarking high-dimensional continuous control RL algorithms. It is one of the most challenging standard MuJoCo benchmarks due to its 17-dimensional action space and 348-dimensional observation space. The v5 version restores contact cost, excludes worldbody data from observations, and adds tendon info.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File: gymnasium/envs/mujoco/humanoid_v5.py
Signature
class HumanoidEnv(MujocoEnv, utils.EzPickle):
def __init__(
self,
xml_file: str = "humanoid.xml",
frame_skip: int = 5,
default_camera_config: dict[str, float | int] = DEFAULT_CAMERA_CONFIG,
forward_reward_weight: float = 1.25,
ctrl_cost_weight: float = 0.1,
contact_cost_weight: float = 5e-7,
contact_cost_range: tuple[float, float] = (-np.inf, 10.0),
healthy_reward: float = 5.0,
terminate_when_unhealthy: bool = True,
healthy_z_range: tuple[float, float] = (1.0, 2.0),
reset_noise_scale: float = 1e-2,
exclude_current_positions_from_observation: bool = True,
include_cinert_in_observation: bool = True,
include_cvel_in_observation: bool = True,
include_qfrc_actuator_in_observation: bool = True,
include_cfrc_ext_in_observation: bool = True,
**kwargs,
)
Import
import gymnasium as gym
env = gym.make("Humanoid-v5")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action | np.ndarray (17,) | Yes | Torques applied to the 17 hinge joints (abdomen, hips, knees, shoulders, elbows), range [-0.4, 0.4] |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | np.ndarray (348,) | State vector: qpos (22), qvel (23), cinert (130), cvel (78), qfrc_actuator (17), cfrc_ext (78); x,y excluded by default |
| reward | float | healthy_reward + forward_reward - ctrl_cost - contact_cost |
| terminated | bool | True if humanoid z-coordinate outside [1.0, 2.0] |
| truncated | bool | Episode truncation (handled by TimeLimit wrapper) |
| info | dict | Contains x_position, y_position, tendon_length, tendon_velocity, distance_from_origin, x_velocity, y_velocity, reward_survive, reward_forward, reward_ctrl, reward_contact |
Usage Examples
import gymnasium as gym
env = gym.make("Humanoid-v5")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()