Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium Walker2dEnv V5

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the Walker2d v5 MuJoCo locomotion environment provided by Gymnasium.

Description

The Walker2d environment builds on the Hopper environment by adding another set of legs, allowing the robot to walk forward instead of hop. The walker is a two-dimensional bipedal robot consisting of seven main body parts: a single torso at the top, two thighs, two legs below the thighs, and two feet. The goal is to walk in the forward (right) direction by applying torque to the six hinges connecting the body parts. The reward function combines a healthy reward, a forward reward based on x-velocity, and a control cost penalty. The v5 version fixes unequal foot friction values (both feet now have friction 1.9) and the healthy_reward bug.

Usage

Use this environment for benchmarking bipedal locomotion RL algorithms. It provides a moderate difficulty challenge between the simpler Hopper and the more complex Humanoid environments. The v5 version includes individual reward terms in info and z_distance_from_origin tracking.

Code Reference

Source Location

Signature

class Walker2dEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        xml_file: str = "walker2d_v5.xml",
        frame_skip: int = 4,
        default_camera_config: dict[str, float | int] = DEFAULT_CAMERA_CONFIG,
        forward_reward_weight: float = 1.0,
        ctrl_cost_weight: float = 1e-3,
        healthy_reward: float = 1.0,
        terminate_when_unhealthy: bool = True,
        healthy_z_range: tuple[float, float] = (0.8, 2.0),
        healthy_angle_range: tuple[float, float] = (-1.0, 1.0),
        reset_noise_scale: float = 5e-3,
        exclude_current_positions_from_observation: bool = True,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("Walker2d-v5")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (6,) Yes Torques applied to thigh, leg, foot, left thigh, left leg, and left foot joints, range [-1, 1]

Outputs

Name Type Description
observation np.ndarray (17,) State vector: qpos (8 elements, x excluded), qvel (9 elements, clipped to [-10, 10])
reward float healthy_reward + forward_reward - ctrl_cost
terminated bool True if walker is unhealthy (z outside [0.8, 2.0] or angle outside [-1, 1])
truncated bool Episode truncation (handled by TimeLimit wrapper)
info dict Contains x_position, z_distance_from_origin, x_velocity, reward_forward, reward_ctrl, reward_survive

Usage Examples

import gymnasium as gym

env = gym.make("Walker2d-v5")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment