Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium HumanoidStandupEnv V5

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the HumanoidStandup v5 MuJoCo environment provided by Gymnasium.

Description

The HumanoidStandup environment is based on Tassa, Erez, and Todorov's work in "Synthesis and stabilization of complex behaviors through online trajectory optimization". It uses the same 3D bipedal humanoid robot as the Humanoid environment, but the environment starts with the humanoid laying on the ground. The goal is to make the humanoid stand up and then keep it standing by applying torques to the various hinges. The reward function is: uph_cost + 1 - quad_ctrl_cost - quad_impact_cost, where uph_cost encourages upward movement (absolute z-coordinate divided by timestep), quad_ctrl_cost penalizes large actions, and quad_impact_cost penalizes large contact forces. Unlike other locomotion environments, the humanoid standup never terminates; episodes only end through truncation.

Usage

Use this environment for benchmarking RL algorithms on the challenging task of learning to stand up from a prone position. It tests an agent's ability to discover complex motor skills. The v5 version adds configurable reward weights, excludes worldbody data from observations, and provides tendon information.

Code Reference

Source Location

Signature

class HumanoidStandupEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        xml_file: str = "humanoidstandup.xml",
        frame_skip: int = 5,
        default_camera_config: dict[str, float | int] = DEFAULT_CAMERA_CONFIG,
        uph_cost_weight: float = 1,
        ctrl_cost_weight: float = 0.1,
        impact_cost_weight: float = 0.5e-6,
        impact_cost_range: tuple[float, float] = (-np.inf, 10.0),
        reset_noise_scale: float = 1e-2,
        exclude_current_positions_from_observation: bool = True,
        include_cinert_in_observation: bool = True,
        include_cvel_in_observation: bool = True,
        include_qfrc_actuator_in_observation: bool = True,
        include_cfrc_ext_in_observation: bool = True,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("HumanoidStandup-v5")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (17,) Yes Torques applied to the 17 hinge joints (abdomen, hips, knees, shoulders, elbows), range [-0.4, 0.4]

Outputs

Name Type Description
observation np.ndarray (348,) State vector: qpos (22), qvel (23), cinert (130), cvel (78), qfrc_actuator (17), cfrc_ext (78); x,y excluded by default
reward float uph_cost - quad_ctrl_cost - quad_impact_cost + 1
terminated bool Always False (HumanoidStandup never terminates)
truncated bool Episode truncation (handled by TimeLimit wrapper, default 1000 timesteps)
info dict Contains x_position, y_position, z_distance_from_origin, tendon_length, tendon_velocity, reward_linup, reward_quadctrl, reward_impact

Usage Examples

import gymnasium as gym

env = gym.make("HumanoidStandup-v5")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment