Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium HumanoidEnv V5

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the Humanoid v5 MuJoCo locomotion environment provided by Gymnasium.

Description

The Humanoid environment is based on Tassa, Erez, and Todorov's work in "Synthesis and stabilization of complex behaviors through online trajectory optimization". The 3D bipedal robot simulates a human with a torso (abdomen), a pair of legs and arms, and tendons connecting hips to knees. The legs each consist of three body parts (thigh, shin, foot) and the arms consist of two body parts (upper arm, forearm). The goal is to walk forward as fast as possible without falling. The reward function combines a healthy reward, a forward reward based on center-of-mass x-velocity, a control cost penalty, and a contact cost penalty. The observation includes qpos, qvel, cinert, cvel, qfrc_actuator, and cfrc_ext data. The episode terminates if the humanoid's z-coordinate falls outside [1.0, 2.0].

Usage

Use this environment for benchmarking high-dimensional continuous control RL algorithms. It is one of the most challenging standard MuJoCo benchmarks due to its 17-dimensional action space and 348-dimensional observation space. The v5 version restores contact cost, excludes worldbody data from observations, and adds tendon info.

Code Reference

Source Location

Signature

class HumanoidEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        xml_file: str = "humanoid.xml",
        frame_skip: int = 5,
        default_camera_config: dict[str, float | int] = DEFAULT_CAMERA_CONFIG,
        forward_reward_weight: float = 1.25,
        ctrl_cost_weight: float = 0.1,
        contact_cost_weight: float = 5e-7,
        contact_cost_range: tuple[float, float] = (-np.inf, 10.0),
        healthy_reward: float = 5.0,
        terminate_when_unhealthy: bool = True,
        healthy_z_range: tuple[float, float] = (1.0, 2.0),
        reset_noise_scale: float = 1e-2,
        exclude_current_positions_from_observation: bool = True,
        include_cinert_in_observation: bool = True,
        include_cvel_in_observation: bool = True,
        include_qfrc_actuator_in_observation: bool = True,
        include_cfrc_ext_in_observation: bool = True,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("Humanoid-v5")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (17,) Yes Torques applied to the 17 hinge joints (abdomen, hips, knees, shoulders, elbows), range [-0.4, 0.4]

Outputs

Name Type Description
observation np.ndarray (348,) State vector: qpos (22), qvel (23), cinert (130), cvel (78), qfrc_actuator (17), cfrc_ext (78); x,y excluded by default
reward float healthy_reward + forward_reward - ctrl_cost - contact_cost
terminated bool True if humanoid z-coordinate outside [1.0, 2.0]
truncated bool Episode truncation (handled by TimeLimit wrapper)
info dict Contains x_position, y_position, tendon_length, tendon_velocity, distance_from_origin, x_velocity, y_velocity, reward_survive, reward_forward, reward_ctrl, reward_contact

Usage Examples

import gymnasium as gym

env = gym.make("Humanoid-v5")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment