Implementation:Farama Foundation Gymnasium HalfCheetahEnv V4
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, MuJoCo_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete implementation of the HalfCheetah v4 MuJoCo locomotion environment provided by Gymnasium.
Description
The HalfCheetah v4 environment is a 2-dimensional robot consisting of 9 body parts and 8 joints. The goal is to apply torque to the joints to make the cheetah run forward as fast as possible. The cheetah's torso and head are fixed, and torque can only be applied to 6 joints: the front and back thighs, shins, and feet. The reward is forward_reward - ctrl_cost. The HalfCheetah never terminates; episodes only end through truncation. This is the legacy version using MuJoCo bindings (mujoco >= 2.1.3).
Usage
Use this environment for reproducing results from papers that used HalfCheetah-v4. For new projects, consider HalfCheetah-v5 which adds more configuration options and consistent reward naming. The HalfCheetah is one of the most commonly used MuJoCo benchmarks for RL algorithm evaluation.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File: gymnasium/envs/mujoco/half_cheetah_v4.py
Signature
class HalfCheetahEnv(MujocoEnv, utils.EzPickle):
def __init__(
self,
forward_reward_weight=1.0,
ctrl_cost_weight=0.1,
reset_noise_scale=0.1,
exclude_current_positions_from_observation=True,
**kwargs,
)
Import
import gymnasium as gym
env = gym.make("HalfCheetah-v4")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action | np.ndarray (6,) | Yes | Torques applied to back thigh, back shin, back foot, front thigh, front shin, front foot, range [-1, 1] |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | np.ndarray (17,) | State vector: qpos (8 elements, x excluded by default), qvel (9 elements) |
| reward | float | forward_reward - ctrl_cost |
| terminated | bool | Always False (HalfCheetah never terminates) |
| truncated | bool | Episode truncation (handled by TimeLimit wrapper) |
| info | dict | Contains x_position, x_velocity, reward_run, reward_ctrl |
Usage Examples
import gymnasium as gym
env = gym.make("HalfCheetah-v4")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()