Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium HalfCheetahEnv V4

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the HalfCheetah v4 MuJoCo locomotion environment provided by Gymnasium.

Description

The HalfCheetah v4 environment is a 2-dimensional robot consisting of 9 body parts and 8 joints. The goal is to apply torque to the joints to make the cheetah run forward as fast as possible. The cheetah's torso and head are fixed, and torque can only be applied to 6 joints: the front and back thighs, shins, and feet. The reward is forward_reward - ctrl_cost. The HalfCheetah never terminates; episodes only end through truncation. This is the legacy version using MuJoCo bindings (mujoco >= 2.1.3).

Usage

Use this environment for reproducing results from papers that used HalfCheetah-v4. For new projects, consider HalfCheetah-v5 which adds more configuration options and consistent reward naming. The HalfCheetah is one of the most commonly used MuJoCo benchmarks for RL algorithm evaluation.

Code Reference

Source Location

Signature

class HalfCheetahEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        forward_reward_weight=1.0,
        ctrl_cost_weight=0.1,
        reset_noise_scale=0.1,
        exclude_current_positions_from_observation=True,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("HalfCheetah-v4")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (6,) Yes Torques applied to back thigh, back shin, back foot, front thigh, front shin, front foot, range [-1, 1]

Outputs

Name Type Description
observation np.ndarray (17,) State vector: qpos (8 elements, x excluded by default), qvel (9 elements)
reward float forward_reward - ctrl_cost
terminated bool Always False (HalfCheetah never terminates)
truncated bool Episode truncation (handled by TimeLimit wrapper)
info dict Contains x_position, x_velocity, reward_run, reward_ctrl

Usage Examples

import gymnasium as gym

env = gym.make("HalfCheetah-v4")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment