Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium HopperEnv V5

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the Hopper v5 MuJoCo locomotion environment provided by Gymnasium.

Description

The Hopper environment is based on Erez, Tassa, and Todorov's work in "Infinite Horizon Model Predictive Control for Nonlinear Periodic Tasks". The hopper is a two-dimensional one-legged figure consisting of four main body parts: the torso at the top, the thigh in the middle, the leg at the bottom, and a single foot. The goal is to make hops that move in the forward (right) direction by applying torque to the three hinges connecting the body parts. The reward function combines a healthy reward, a forward reward based on x-velocity, and a control cost penalty. The episode terminates if the hopper becomes unhealthy (z-coordinate or angle outside healthy range, or state values outside bounds).

Usage

Use this environment for benchmarking continuous control RL algorithms on a simpler locomotion task than bipedal walking. It is a standard benchmark for evaluating policy gradient methods and model-free deep RL. The v5 version fixes the healthy_reward bug and adds detailed reward term reporting.

Code Reference

Source Location

Signature

class HopperEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        xml_file: str = "hopper.xml",
        frame_skip: int = 4,
        default_camera_config: dict[str, float | int] = DEFAULT_CAMERA_CONFIG,
        forward_reward_weight: float = 1.0,
        ctrl_cost_weight: float = 1e-3,
        healthy_reward: float = 1.0,
        terminate_when_unhealthy: bool = True,
        healthy_state_range: tuple[float, float] = (-100.0, 100.0),
        healthy_z_range: tuple[float, float] = (0.7, float("inf")),
        healthy_angle_range: tuple[float, float] = (-0.2, 0.2),
        reset_noise_scale: float = 5e-3,
        exclude_current_positions_from_observation: bool = True,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("Hopper-v5")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (3,) Yes Torques applied to the thigh, leg, and foot hinge joints, range [-1, 1]

Outputs

Name Type Description
observation np.ndarray (11,) State vector: qpos (5 elements, x excluded), qvel (6 elements, clipped to [-10, 10])
reward float healthy_reward + forward_reward - ctrl_cost
terminated bool True if hopper is unhealthy (z, angle, or state outside healthy ranges)
truncated bool Episode truncation (handled by TimeLimit wrapper)
info dict Contains x_position, z_distance_from_origin, x_velocity, reward_forward, reward_ctrl, reward_survive

Usage Examples

import gymnasium as gym

env = gym.make("Hopper-v5")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment