Implementation:Farama Foundation Gymnasium HumanoidEnv V4
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, MuJoCo_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete implementation of the Humanoid v4 MuJoCo locomotion environment provided by Gymnasium.
Description
The Humanoid v4 environment implements the 3D bipedal humanoid robot using MuJoCo bindings (mujoco >= 2.1.3). The humanoid has a torso (abdomen) with a pair of legs and arms. The goal is to walk forward as fast as possible without falling over. The reward combines forward velocity (based on center-of-mass displacement), a healthy survival reward, and a control cost penalty. The observation includes qpos, qvel, cinert, cvel, qfrc_actuator, and cfrc_ext data (376 elements). Note that v4 has a known bug where contact_cost is always 0. This is the legacy version; v5 is recommended for new projects.
Usage
Use this environment for reproducing results from papers that used Humanoid-v4. For new research, consider Humanoid-v5 which fixes the contact_cost bug, excludes constant-zero observations, and provides more detailed info dictionaries.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File: gymnasium/envs/mujoco/humanoid_v4.py
Signature
class HumanoidEnv(MujocoEnv, utils.EzPickle):
def __init__(
self,
forward_reward_weight=1.25,
ctrl_cost_weight=0.1,
healthy_reward=5.0,
terminate_when_unhealthy=True,
healthy_z_range=(1.0, 2.0),
reset_noise_scale=1e-2,
exclude_current_positions_from_observation=True,
**kwargs,
)
Import
import gymnasium as gym
env = gym.make("Humanoid-v4")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action | np.ndarray (17,) | Yes | Torques applied to the 17 hinge joints, range [-0.4, 0.4] |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | np.ndarray (376,) | State vector: qpos (22, x/y excluded), qvel (23), cinert, cvel, qfrc_actuator, cfrc_ext (includes worldbody) |
| reward | float | forward_reward + healthy_reward - ctrl_cost |
| terminated | bool | True if humanoid z-coordinate outside [1.0, 2.0] |
| truncated | bool | Episode truncation (handled by TimeLimit wrapper) |
| info | dict | Contains reward_linvel, reward_quadctrl, reward_alive, x_position, y_position, distance_from_origin, x_velocity, y_velocity, forward_reward |
Usage Examples
import gymnasium as gym
env = gym.make("Humanoid-v4")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()