Implementation:Farama Foundation Gymnasium HopperEnv V5

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, MuJoCo_Environments
Last Updated	2026-02-15 03:00 GMT

Overview

Concrete implementation of the Hopper v5 MuJoCo locomotion environment provided by Gymnasium.

Description

The Hopper environment is based on Erez, Tassa, and Todorov's work in "Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks". The hopper is a two-dimensional one-legged figure consisting of four main body parts: the torso at the top, the thigh in the middle, the leg at the bottom, and a single foot. The goal is to make hops that move in the forward (right) direction by applying torque to the three hinges connecting the body parts. The reward function combines a healthy reward, a forward reward based on x-velocity, and a control cost penalty. The episode terminates if the hopper becomes unhealthy (z-coordinate or angle outside healthy range, or state values outside bounds).

Usage

Use this environment for benchmarking continuous control RL algorithms on a simpler locomotion task than bipedal walking. It is a standard benchmark for evaluating policy gradient methods and model-free deep RL. The v5 version fixes the healthy_reward bug and adds detailed reward term reporting.

Code Reference

Source Location

Repository: Farama_Foundation_Gymnasium
File: gymnasium/envs/mujoco/hopper_v5.py

Signature

class HopperEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        xml_file: str = "hopper.xml",
        frame_skip: int = 4,
        default_camera_config: dict[str, float | int] = DEFAULT_CAMERA_CONFIG,
        forward_reward_weight: float = 1.0,
        ctrl_cost_weight: float = 1e-3,
        healthy_reward: float = 1.0,
        terminate_when_unhealthy: bool = True,
        healthy_state_range: tuple[float, float] = (-100.0, 100.0),
        healthy_z_range: tuple[float, float] = (0.7, float("inf")),
        healthy_angle_range: tuple[float, float] = (-0.2, 0.2),
        reset_noise_scale: float = 5e-3,
        exclude_current_positions_from_observation: bool = True,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("Hopper-v5")

I/O Contract

Inputs

Name	Type	Required	Description
action	np.ndarray (3,)	Yes	Torques applied to the thigh, leg, and foot hinge joints, range [-1, 1]

Outputs

Name	Type	Description
observation	np.ndarray (11,)	State vector: qpos (5 elements, x excluded), qvel (6 elements, clipped to [-10, 10])
reward	float	healthy_reward + forward_reward - ctrl_cost
terminated	bool	True if hopper is unhealthy (z, angle, or state outside healthy ranges)
truncated	bool	Episode truncation (handled by TimeLimit wrapper)
info	dict	Contains x_position, z_distance_from_origin, x_velocity, reward_forward, reward_ctrl, reward_survive

Usage Examples

import gymnasium as gym

env = gym.make("Hopper-v5")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment