Implementation:Farama Foundation Gymnasium BipedalWalker
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Box2D_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
BipedalWalker is a continuous-control reinforcement learning environment that simulates a 4-joint bipedal robot walking across procedurally generated terrain using the Box2D physics engine.
Description
The BipedalWalker class implements a Gymnasium environment (gym.Env) where an agent controls a two-legged robot that must learn to walk forward across terrain. The robot consists of a hull (torso) connected to two legs, each with an upper and lower segment joined by hip and knee revolute joints. The environment uses the Box2D physics engine to simulate rigid body dynamics, joint constraints, and ground contact detection. A ContactDetector listener tracks when the hull or leg segments touch the ground, determining game-over conditions and leg ground-contact state.
The environment supports two difficulty modes: a normal mode with slightly uneven terrain, and a hardcore mode (enabled via the hardcore parameter) that generates terrain with ladders, stumps, pitfalls, and stairs. Terrain generation is procedural, creating a sequence of terrain segments with varying height profiles. The agent observes a 24-dimensional state vector comprising hull angle and velocity, joint angles and angular speeds, leg ground contact flags, and 10 lidar rangefinder measurements that sense distance to terrain ahead.
Rewards are shaped to encourage forward movement (totaling 300+ points at the far end) while penalizing hull-ground contact (-100 termination penalty) and motor torque usage (small per-step cost). The environment also includes a built-in heuristic controller (BipedalWalkerHeuristics) useful for generating demonstration trajectories.
Usage
Use BipedalWalker when you need a continuous-control locomotion benchmark for training reinforcement learning agents. It is commonly used with policy gradient methods (PPO, SAC, TD3) and serves as a standard testbed for evaluating walking policies. The normal variant tests basic locomotion, while the hardcore variant tests robustness to complex terrain obstacles.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File: gymnasium/envs/box2d/bipedal_walker.py
Signature
class BipedalWalker(gym.Env, EzPickle):
def __init__(
self,
render_mode: str | None = None,
hardcore: bool = False,
)
Import
import gymnasium as gym
# Via gym.make (recommended)
env = gym.make("BipedalWalker-v3")
# Direct import
from gymnasium.envs.box2d.bipedal_walker import BipedalWalker
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| render_mode | str or None | No | Rendering mode; one of "human" (opens a window), "rgb_array" (returns pixel array), or None (no rendering). Default is None. |
| hardcore | bool | No | If True, generates terrain with ladders, stumps, pitfalls, and stairs. Default is False. |
Action Space (step input):
| Name | Type | Required | Description |
|---|---|---|---|
| action | np.ndarray (shape (4,), float32) | Yes | Motor speed values in [-1, 1] for each of the 4 joints: [hip_1, knee_1, hip_2, knee_2]. Controls torque magnitude and direction at each joint. |
Reset Parameters:
| Name | Type | Required | Description |
|---|---|---|---|
| seed | int or None | No | Random seed for reproducibility. |
| options | dict or None | No | Additional reset options (currently unused). |
Outputs
Observation Space:
| Index | Name | Type | Description |
|---|---|---|---|
| 0 | hull_angle | float32 | Angle of the hull (torso) body. |
| 1 | hull_angular_velocity | float32 | Angular velocity of the hull, normalized by FPS. |
| 2 | horizontal_speed | float32 | Horizontal velocity of the hull, normalized. |
| 3 | vertical_speed | float32 | Vertical velocity of the hull, normalized. |
| 4 | hip_joint_1_angle | float32 | Angle of the first hip joint. |
| 5 | hip_joint_1_speed | float32 | Angular speed of the first hip joint, normalized by SPEED_HIP. |
| 6 | knee_joint_1_angle | float32 | Angle of the first knee joint (offset by +1.0). |
| 7 | knee_joint_1_speed | float32 | Angular speed of the first knee joint, normalized by SPEED_KNEE. |
| 8 | leg_1_ground_contact | float32 | 1.0 if the first lower leg is in contact with the ground, 0.0 otherwise. |
| 9 | hip_joint_2_angle | float32 | Angle of the second hip joint. |
| 10 | hip_joint_2_speed | float32 | Angular speed of the second hip joint, normalized by SPEED_HIP. |
| 11 | knee_joint_2_angle | float32 | Angle of the second knee joint (offset by +1.0). |
| 12 | knee_joint_2_speed | float32 | Angular speed of the second knee joint, normalized by SPEED_KNEE. |
| 13 | leg_2_ground_contact | float32 | 1.0 if the second lower leg is in contact with the ground, 0.0 otherwise. |
| 14-23 | lidar_0 through lidar_9 | float32 | 10 lidar rangefinder measurements (fraction of max range, 0.0 to 1.0). |
Step Return:
| Name | Type | Description |
|---|---|---|
| observation | np.ndarray (shape (24,), float32) | The 24-dimensional state vector described above. |
| reward | float | Reward for the current step. Positive for forward movement, negative for motor torque usage and falling. |
| terminated | bool | True if the hull contacts the ground (game over), the walker moves off-screen left, or the walker reaches the far end of the terrain. |
| truncated | bool | Always False (time limit handled by TimeLimit wrapper). |
| info | dict | Empty dictionary. |
Key Methods
| Method | Description |
|---|---|
| reset(seed, options) | Destroys the current world, regenerates terrain and clouds, creates the hull and leg bodies with joints, initializes lidar, and returns the initial observation. |
| step(action) | Applies motor torques to joints, steps the physics simulation, casts lidar rays, computes the observation and shaped reward, and checks termination conditions. |
| render() | Renders the environment using pygame. Draws terrain, clouds, the walker body, lidar rays, and a flag marker. Supports "human" and "rgb_array" modes. |
| close() | Closes the pygame display window and releases resources. |
| _generate_terrain(hardcore) | Procedurally generates terrain with optional hardcore obstacles (pits, stumps, stairs). |
| _generate_clouds() | Creates decorative cloud polygons for rendering. |
| _destroy() | Removes all Box2D bodies (terrain, hull, legs) from the physics world. |
Usage Examples
import gymnasium as gym
import numpy as np
# Create the standard BipedalWalker environment
env = gym.make("BipedalWalker-v3", render_mode="rgb_array")
# Reset the environment
observation, info = env.reset(seed=42)
print(f"Observation shape: {observation.shape}") # (24,)
# Run a few steps with random actions
for step in range(100):
action = env.action_space.sample() # shape (4,), values in [-1, 1]
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()
# Create the hardcore variant
env_hard = gym.make("BipedalWalker-v3", hardcore=True, render_mode="rgb_array")
observation, info = env_hard.reset(seed=42)
env_hard.close()