Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium HopperEnv V4

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the Hopper v4 MuJoCo locomotion environment provided by Gymnasium.

Description

The Hopper v4 environment implements the 2D one-legged hopper using MuJoCo bindings (mujoco >= 2.1.3). The hopper has four body parts: torso, thigh, leg, and foot. The goal is to hop forward by applying torques to three hinge joints. The reward combines forward velocity, healthy reward, and a control cost penalty. The episode terminates if the hopper becomes unhealthy. This is the legacy version; v5 is recommended for new projects. Note that v4 has a known issue where healthy_reward is given on every step even when unhealthy.

Usage

Use this environment for reproducing results from papers that used Hopper-v4 specifically. For new research, consider Hopper-v5 which fixes the healthy_reward bug and provides more detailed info dictionaries.

Code Reference

Source Location

Signature

class HopperEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        forward_reward_weight=1.0,
        ctrl_cost_weight=1e-3,
        healthy_reward=1.0,
        terminate_when_unhealthy=True,
        healthy_state_range=(-100.0, 100.0),
        healthy_z_range=(0.7, float("inf")),
        healthy_angle_range=(-0.2, 0.2),
        reset_noise_scale=5e-3,
        exclude_current_positions_from_observation=True,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("Hopper-v4")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (3,) Yes Torques applied to thigh, leg, and foot hinge joints, range [-1, 1]

Outputs

Name Type Description
observation np.ndarray (11,) State vector: qpos (5 elements, x excluded), qvel (6 elements, clipped to [-10, 10])
reward float forward_reward + healthy_reward - ctrl_cost
terminated bool True if hopper is unhealthy
truncated bool Episode truncation (handled by TimeLimit wrapper)
info dict Contains x_position, x_velocity

Usage Examples

import gymnasium as gym

env = gym.make("Hopper-v4")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment