Implementation:Farama Foundation Gymnasium Walker2dEnv V4

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, MuJoCo_Environments
Last Updated	2026-02-15 03:00 GMT

Overview

Concrete implementation of the Walker2d v4 MuJoCo locomotion environment provided by Gymnasium.

Description

The Walker2d v4 environment implements the 2D bipedal walking robot using MuJoCo bindings (mujoco >= 2.1.3). The walker has seven body parts: torso, two thighs, two legs, and two feet, connected by six hinge joints. The goal is to walk forward by applying torques to the joints. The reward combines forward velocity, a healthy survival reward, and a control cost penalty. The episode terminates if the walker becomes unhealthy (z-coordinate or angle outside healthy ranges). Note that v4 has a known issue where the left and right feet have different friction values (0.9 and 1.9) and healthy_reward is given on every step even when unhealthy.

Usage

Use this environment for reproducing results from papers that used Walker2d-v4. For new research, consider Walker2d-v5 which fixes the foot friction asymmetry, the healthy_reward bug, and provides more detailed info dictionaries.

Code Reference

Source Location

Repository: Farama_Foundation_Gymnasium
File: gymnasium/envs/mujoco/walker2d_v4.py

Signature

class Walker2dEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        forward_reward_weight=1.0,
        ctrl_cost_weight=1e-3,
        healthy_reward=1.0,
        terminate_when_unhealthy=True,
        healthy_z_range=(0.8, 2.0),
        healthy_angle_range=(-1.0, 1.0),
        reset_noise_scale=5e-3,
        exclude_current_positions_from_observation=True,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("Walker2d-v4")

I/O Contract

Inputs

Name	Type	Required	Description
action	np.ndarray (6,)	Yes	Torques applied to thigh, leg, foot, left thigh, left leg, and left foot joints, range [-1, 1]

Outputs

Name	Type	Description
observation	np.ndarray (17,)	State vector: qpos (8 elements, x excluded), qvel (9 elements, clipped to [-10, 10])
reward	float	forward_reward + healthy_reward - ctrl_cost
terminated	bool	True if walker is unhealthy (z or angle outside healthy ranges)
truncated	bool	Episode truncation (handled by TimeLimit wrapper)
info	dict	Contains x_position, x_velocity

Usage Examples

import gymnasium as gym

env = gym.make("Walker2d-v4")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment