Implementation:Farama Foundation Gymnasium HopperEnv V5
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, MuJoCo_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete implementation of the Hopper v5 MuJoCo locomotion environment provided by Gymnasium.
Description
The Hopper environment is based on Erez, Tassa, and Todorov's work in "Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks". The hopper is a two-dimensional one-legged figure consisting of four main body parts: the torso at the top, the thigh in the middle, the leg at the bottom, and a single foot. The goal is to make hops that move in the forward (right) direction by applying torque to the three hinges connecting the body parts. The reward function combines a healthy reward, a forward reward based on x-velocity, and a control cost penalty. The episode terminates if the hopper becomes unhealthy (z-coordinate or angle outside healthy range, or state values outside bounds).
Usage
Use this environment for benchmarking continuous control RL algorithms on a simpler locomotion task than bipedal walking. It is a standard benchmark for evaluating policy gradient methods and model-free deep RL. The v5 version fixes the healthy_reward bug and adds detailed reward term reporting.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File: gymnasium/envs/mujoco/hopper_v5.py
Signature
class HopperEnv(MujocoEnv, utils.EzPickle):
def __init__(
self,
xml_file: str = "hopper.xml",
frame_skip: int = 4,
default_camera_config: dict[str, float | int] = DEFAULT_CAMERA_CONFIG,
forward_reward_weight: float = 1.0,
ctrl_cost_weight: float = 1e-3,
healthy_reward: float = 1.0,
terminate_when_unhealthy: bool = True,
healthy_state_range: tuple[float, float] = (-100.0, 100.0),
healthy_z_range: tuple[float, float] = (0.7, float("inf")),
healthy_angle_range: tuple[float, float] = (-0.2, 0.2),
reset_noise_scale: float = 5e-3,
exclude_current_positions_from_observation: bool = True,
**kwargs,
)
Import
import gymnasium as gym
env = gym.make("Hopper-v5")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action | np.ndarray (3,) | Yes | Torques applied to the thigh, leg, and foot hinge joints, range [-1, 1] |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | np.ndarray (11,) | State vector: qpos (5 elements, x excluded), qvel (6 elements, clipped to [-10, 10]) |
| reward | float | healthy_reward + forward_reward - ctrl_cost |
| terminated | bool | True if hopper is unhealthy (z, angle, or state outside healthy ranges) |
| truncated | bool | Episode truncation (handled by TimeLimit wrapper) |
| info | dict | Contains x_position, z_distance_from_origin, x_velocity, reward_forward, reward_ctrl, reward_survive |
Usage Examples
import gymnasium as gym
env = gym.make("Hopper-v5")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()