Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium BipedalWalker

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Box2D_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

BipedalWalker is a continuous-control reinforcement learning environment that simulates a 4-joint bipedal robot walking across procedurally generated terrain using the Box2D physics engine.

Description

The BipedalWalker class implements a Gymnasium environment (gym.Env) where an agent controls a two-legged robot that must learn to walk forward across terrain. The robot consists of a hull (torso) connected to two legs, each with an upper and lower segment joined by hip and knee revolute joints. The environment uses the Box2D physics engine to simulate rigid body dynamics, joint constraints, and ground contact detection. A ContactDetector listener tracks when the hull or leg segments touch the ground, determining game-over conditions and leg ground-contact state.

The environment supports two difficulty modes: a normal mode with slightly uneven terrain, and a hardcore mode (enabled via the hardcore parameter) that generates terrain with ladders, stumps, pitfalls, and stairs. Terrain generation is procedural, creating a sequence of terrain segments with varying height profiles. The agent observes a 24-dimensional state vector comprising hull angle and velocity, joint angles and angular speeds, leg ground contact flags, and 10 lidar rangefinder measurements that sense distance to terrain ahead.

Rewards are shaped to encourage forward movement (totaling 300+ points at the far end) while penalizing hull-ground contact (-100 termination penalty) and motor torque usage (small per-step cost). The environment also includes a built-in heuristic controller (BipedalWalkerHeuristics) useful for generating demonstration trajectories.

Usage

Use BipedalWalker when you need a continuous-control locomotion benchmark for training reinforcement learning agents. It is commonly used with policy gradient methods (PPO, SAC, TD3) and serves as a standard testbed for evaluating walking policies. The normal variant tests basic locomotion, while the hardcore variant tests robustness to complex terrain obstacles.

Code Reference

Source Location

Signature

class BipedalWalker(gym.Env, EzPickle):
    def __init__(
        self,
        render_mode: str | None = None,
        hardcore: bool = False,
    )

Import

import gymnasium as gym

# Via gym.make (recommended)
env = gym.make("BipedalWalker-v3")

# Direct import
from gymnasium.envs.box2d.bipedal_walker import BipedalWalker

I/O Contract

Inputs

Name Type Required Description
render_mode str or None No Rendering mode; one of "human" (opens a window), "rgb_array" (returns pixel array), or None (no rendering). Default is None.
hardcore bool No If True, generates terrain with ladders, stumps, pitfalls, and stairs. Default is False.

Action Space (step input):

Name Type Required Description
action np.ndarray (shape (4,), float32) Yes Motor speed values in [-1, 1] for each of the 4 joints: [hip_1, knee_1, hip_2, knee_2]. Controls torque magnitude and direction at each joint.

Reset Parameters:

Name Type Required Description
seed int or None No Random seed for reproducibility.
options dict or None No Additional reset options (currently unused).

Outputs

Observation Space:

Index Name Type Description
0 hull_angle float32 Angle of the hull (torso) body.
1 hull_angular_velocity float32 Angular velocity of the hull, normalized by FPS.
2 horizontal_speed float32 Horizontal velocity of the hull, normalized.
3 vertical_speed float32 Vertical velocity of the hull, normalized.
4 hip_joint_1_angle float32 Angle of the first hip joint.
5 hip_joint_1_speed float32 Angular speed of the first hip joint, normalized by SPEED_HIP.
6 knee_joint_1_angle float32 Angle of the first knee joint (offset by +1.0).
7 knee_joint_1_speed float32 Angular speed of the first knee joint, normalized by SPEED_KNEE.
8 leg_1_ground_contact float32 1.0 if the first lower leg is in contact with the ground, 0.0 otherwise.
9 hip_joint_2_angle float32 Angle of the second hip joint.
10 hip_joint_2_speed float32 Angular speed of the second hip joint, normalized by SPEED_HIP.
11 knee_joint_2_angle float32 Angle of the second knee joint (offset by +1.0).
12 knee_joint_2_speed float32 Angular speed of the second knee joint, normalized by SPEED_KNEE.
13 leg_2_ground_contact float32 1.0 if the second lower leg is in contact with the ground, 0.0 otherwise.
14-23 lidar_0 through lidar_9 float32 10 lidar rangefinder measurements (fraction of max range, 0.0 to 1.0).

Step Return:

Name Type Description
observation np.ndarray (shape (24,), float32) The 24-dimensional state vector described above.
reward float Reward for the current step. Positive for forward movement, negative for motor torque usage and falling.
terminated bool True if the hull contacts the ground (game over), the walker moves off-screen left, or the walker reaches the far end of the terrain.
truncated bool Always False (time limit handled by TimeLimit wrapper).
info dict Empty dictionary.

Key Methods

Method Description
reset(seed, options) Destroys the current world, regenerates terrain and clouds, creates the hull and leg bodies with joints, initializes lidar, and returns the initial observation.
step(action) Applies motor torques to joints, steps the physics simulation, casts lidar rays, computes the observation and shaped reward, and checks termination conditions.
render() Renders the environment using pygame. Draws terrain, clouds, the walker body, lidar rays, and a flag marker. Supports "human" and "rgb_array" modes.
close() Closes the pygame display window and releases resources.
_generate_terrain(hardcore) Procedurally generates terrain with optional hardcore obstacles (pits, stumps, stairs).
_generate_clouds() Creates decorative cloud polygons for rendering.
_destroy() Removes all Box2D bodies (terrain, hull, legs) from the physics world.

Usage Examples

import gymnasium as gym
import numpy as np

# Create the standard BipedalWalker environment
env = gym.make("BipedalWalker-v3", render_mode="rgb_array")

# Reset the environment
observation, info = env.reset(seed=42)
print(f"Observation shape: {observation.shape}")  # (24,)

# Run a few steps with random actions
for step in range(100):
    action = env.action_space.sample()  # shape (4,), values in [-1, 1]
    observation, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        observation, info = env.reset()

env.close()

# Create the hardcore variant
env_hard = gym.make("BipedalWalker-v3", hardcore=True, render_mode="rgb_array")
observation, info = env_hard.reset(seed=42)
env_hard.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment