Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium HumanoidStandupEnv V4

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the HumanoidStandup v4 MuJoCo environment provided by Gymnasium.

Description

The HumanoidStandup v4 environment uses the same 3D bipedal humanoid robot as the Humanoid environment, but the robot starts laying on the ground. The goal is to make the humanoid stand up and keep it standing. The reward is: uph_cost - quad_ctrl_cost - quad_impact_cost + 1, where uph_cost is the z-position divided by the timestep, quad_ctrl_cost is 0.1 times the squared control, and quad_impact_cost is 0.5e-6 times the squared contact forces (clamped to max 10). The environment never terminates; episodes end only through truncation. The observation space is 376-dimensional, including qpos, qvel, cinert, cvel, qfrc_actuator, and cfrc_ext. This is a minimal v4 implementation with hardcoded reward weights.

Usage

Use this environment for reproducing results from papers that used HumanoidStandup-v4. For new research, consider HumanoidStandup-v5 which provides configurable reward weights, excludes constant-zero observations, and offers more detailed info dictionaries.

Code Reference

Source Location

Signature

class HumanoidStandupEnv(MujocoEnv, utils.EzPickle):
    def __init__(self, **kwargs)

Import

import gymnasium as gym
env = gym.make("HumanoidStandup-v4")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (17,) Yes Torques applied to the 17 hinge joints, range [-0.4, 0.4]

Outputs

Name Type Description
observation np.ndarray (376,) State vector: qpos (22, x/y excluded), qvel (23), cinert, cvel, qfrc_actuator, cfrc_ext (includes worldbody)
reward float uph_cost - quad_ctrl_cost - quad_impact_cost + 1
terminated bool Always False (HumanoidStandup never terminates)
truncated bool Episode truncation (handled by TimeLimit wrapper, default 1000 timesteps)
info dict Contains reward_linup, reward_quadctrl, reward_impact

Usage Examples

import gymnasium as gym

env = gym.make("HumanoidStandup-v4")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment