Implementation:Farama Foundation Gymnasium BipedalWalker

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, Box2D_Environments
Last Updated	2026-02-15 03:00 GMT

Overview

BipedalWalker is a continuous-control reinforcement learning environment that simulates a 4-joint bipedal robot walking across procedurally generated terrain using the Box2D physics engine.

Description

The BipedalWalker class implements a Gymnasium environment (gym.Env) where an agent controls a two-legged robot that must learn to walk forward across terrain. The robot consists of a hull (torso) connected to two legs, each with an upper and lower segment joined by hip and knee revolute joints. The environment uses the Box2D physics engine to simulate rigid body dynamics, joint constraints, and ground contact detection. A ContactDetector listener tracks when the hull or leg segments touch the ground, determining game-over conditions and leg ground-contact state.

The environment supports two difficulty modes: a normal mode with slightly uneven terrain, and a hardcore mode (enabled via the hardcore parameter) that generates terrain with ladders, stumps, pitfalls, and stairs. Terrain generation is procedural, creating a sequence of terrain segments with varying height profiles. The agent observes a 24-dimensional state vector comprising hull angle and velocity, joint angles and angular speeds, leg ground contact flags, and 10 lidar rangefinder measurements that sense distance to terrain ahead.

Rewards are shaped to encourage forward movement (totaling 300+ points at the far end) while penalizing hull-ground contact (-100 termination penalty) and motor torque usage (small per-step cost). The environment also includes a built-in heuristic controller (BipedalWalkerHeuristics) useful for generating demonstration trajectories.

Usage

Use BipedalWalker when you need a continuous-control locomotion benchmark for training reinforcement learning agents. It is commonly used with policy gradient methods (PPO, SAC, TD3) and serves as a standard testbed for evaluating walking policies. The normal variant tests basic locomotion, while the hardcore variant tests robustness to complex terrain obstacles.

Code Reference

Source Location

Repository: Farama_Foundation_Gymnasium
File: gymnasium/envs/box2d/bipedal_walker.py

Signature

class BipedalWalker(gym.Env, EzPickle):
    def __init__(
        self,
        render_mode: str | None = None,
        hardcore: bool = False,
    )

Import

import gymnasium as gym

# Via gym.make (recommended)
env = gym.make("BipedalWalker-v3")

# Direct import
from gymnasium.envs.box2d.bipedal_walker import BipedalWalker

I/O Contract

Inputs

Name	Type	Required	Description
render_mode	str or None	No	Rendering mode; one of "human" (opens a window), "rgb_array" (returns pixel array), or None (no rendering). Default is None.
hardcore	bool	No	If True, generates terrain with ladders, stumps, pitfalls, and stairs. Default is False.

Action Space (step input):

Name	Type	Required	Description
action	np.ndarray (shape (4,), float32)	Yes	Motor speed values in [-1, 1] for each of the 4 joints: [hip_1, knee_1, hip_2, knee_2]. Controls torque magnitude and direction at each joint.

Reset Parameters:

Name	Type	Required	Description
seed	int or None	No	Random seed for reproducibility.
options	dict or None	No	Additional reset options (currently unused).

Outputs

Observation Space:

Index	Name	Type	Description
0	hull_angle	float32	Angle of the hull (torso) body.
1	hull_angular_velocity	float32	Angular velocity of the hull, normalized by FPS.
2	horizontal_speed	float32	Horizontal velocity of the hull, normalized.
3	vertical_speed	float32	Vertical velocity of the hull, normalized.
4	hip_joint_1_angle	float32	Angle of the first hip joint.
5	hip_joint_1_speed	float32	Angular speed of the first hip joint, normalized by SPEED_HIP.
6	knee_joint_1_angle	float32	Angle of the first knee joint (offset by +1.0).
7	knee_joint_1_speed	float32	Angular speed of the first knee joint, normalized by SPEED_KNEE.
8	leg_1_ground_contact	float32	1.0 if the first lower leg is in contact with the ground, 0.0 otherwise.
9	hip_joint_2_angle	float32	Angle of the second hip joint.
10	hip_joint_2_speed	float32	Angular speed of the second hip joint, normalized by SPEED_HIP.
11	knee_joint_2_angle	float32	Angle of the second knee joint (offset by +1.0).
12	knee_joint_2_speed	float32	Angular speed of the second knee joint, normalized by SPEED_KNEE.
13	leg_2_ground_contact	float32	1.0 if the second lower leg is in contact with the ground, 0.0 otherwise.
14-23	lidar_0 through lidar_9	float32	10 lidar rangefinder measurements (fraction of max range, 0.0 to 1.0).

Step Return:

Name	Type	Description
observation	np.ndarray (shape (24,), float32)	The 24-dimensional state vector described above.
reward	float	Reward for the current step. Positive for forward movement, negative for motor torque usage and falling.
terminated	bool	True if the hull contacts the ground (game over), the walker moves off-screen left, or the walker reaches the far end of the terrain.
truncated	bool	Always False (time limit handled by TimeLimit wrapper).
info	dict	Empty dictionary.

Key Methods

Method	Description
reset(seed, options)	Destroys the current world, regenerates terrain and clouds, creates the hull and leg bodies with joints, initializes lidar, and returns the initial observation.
step(action)	Applies motor torques to joints, steps the physics simulation, casts lidar rays, computes the observation and shaped reward, and checks termination conditions.
render()	Renders the environment using pygame. Draws terrain, clouds, the walker body, lidar rays, and a flag marker. Supports "human" and "rgb_array" modes.
close()	Closes the pygame display window and releases resources.
_generate_terrain(hardcore)	Procedurally generates terrain with optional hardcore obstacles (pits, stumps, stairs).
_generate_clouds()	Creates decorative cloud polygons for rendering.
_destroy()	Removes all Box2D bodies (terrain, hull, legs) from the physics world.

Usage Examples

import gymnasium as gym
import numpy as np

# Create the standard BipedalWalker environment
env = gym.make("BipedalWalker-v3", render_mode="rgb_array")

# Reset the environment
observation, info = env.reset(seed=42)
print(f"Observation shape: {observation.shape}")  # (24,)

# Run a few steps with random actions
for step in range(100):
    action = env.action_space.sample()  # shape (4,), values in [-1, 1]
    observation, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        observation, info = env.reset()

env.close()

# Create the hardcore variant
env_hard = gym.make("BipedalWalker-v3", hardcore=True, render_mode="rgb_array")
observation, info = env_hard.reset(seed=42)
env_hard.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment