Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium CartPoleEnv

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Classic_Control
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete tool for the CartPole classic control environment provided by Gymnasium.

Description

This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in "Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem". A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces in the left and right direction on the cart.

The physics simulation uses Euler integration (configurable to semi-implicit Euler) with a timestep of 0.02 seconds. The cart has a mass of 1.0 kg, the pole has a mass of 0.1 kg and a half-length of 0.5 m, and gravity is set to 9.8 m/s^2. A fixed force of 10.0 N is applied to the cart in one of two directions at each step. The pole angle dynamics are computed using the full nonlinear equations of motion for the cart-pole system.

The episode terminates when the pole angle exceeds 12 degrees from vertical, when the cart position exceeds 2.4 units from center, or when the episode length exceeds 500 steps (for v1). A vectorized implementation (CartPoleVectorEnv) is also provided for high-throughput parallel simulation.

Usage

CartPole is one of the most widely used benchmark environments in reinforcement learning. It is commonly the first environment used to introduce RL concepts and to test new algorithm implementations. The environment is suitable for evaluating simple policy gradient methods, Q-learning, actor-critic methods, and other discrete-action RL algorithms. Its simplicity makes it ideal for educational purposes, debugging new algorithms, and rapid prototyping.

Code Reference

Source Location

Signature

class CartPoleEnv(gym.Env[np.ndarray, int | np.ndarray]):
    def __init__(self, sutton_barto_reward: bool = False, render_mode: str | None = None):

Import

import gymnasium as gym
env = gym.make("CartPole-v1")

I/O Contract

Inputs

Name Type Required Description
action int Yes Discrete action in {0, 1}: push cart to the left (0) or right (1)

Outputs

Name Type Description
observation np.ndarray (shape (4,), float32) [cart_position, cart_velocity, pole_angle, pole_angular_velocity]
reward float +1.0 per step by default; 0.0/+1.0/-1.0 with sutton_barto_reward=True (see Reward section)
terminated bool True when pole angle > 12 degrees or cart position > 2.4 units from center
truncated bool False (truncation handled by TimeLimit wrapper; default 500 steps for v1)
info dict Empty dictionary

Observation Space Details

Index Observation Min Max
0 Cart Position -4.8 4.8
1 Cart Velocity -Inf Inf
2 Pole Angle -0.418 rad (-24 degrees) 0.418 rad (24 degrees)
3 Pole Angular Velocity -Inf Inf

Action Space Details

Value Action
0 Push cart to the left
1 Push cart to the right

Reward Details

Mode Non-terminated Step Termination Step Post-termination Step
Default (sutton_barto_reward=False) +1.0 +1.0 0.0
Sutton-Barto (sutton_barto_reward=True) 0.0 -1.0 -1.0

Key Methods

Method Description
__init__(sutton_barto_reward=False, render_mode=None) Initializes the environment with observation space Box(4,), action space Discrete(2), physics parameters, and optional reward variant
reset(seed=None, options=None) Resets the state to random values in [-0.05, 0.05] (customizable via options "low"/"high"); returns (observation, info)
step(action) Applies force to the cart, integrates dynamics via Euler method, checks termination conditions, and returns (observation, reward, terminated, truncated, info)
render() Renders the environment using pygame in "human" or "rgb_array" mode
close() Closes the pygame display and cleans up resources

Physics Parameters

Parameter Value Description
gravity 9.8 m/s^2 Gravitational acceleration
masscart 1.0 kg Mass of the cart
masspole 0.1 kg Mass of the pole
length 0.5 m Half the pole's length
force_mag 10.0 N Magnitude of force applied to the cart
tau 0.02 s Seconds between state updates (integration timestep)
theta_threshold_radians 0.2095 rad (12 degrees) Angle at which episode terminates
x_threshold 2.4 Cart position at which episode terminates

Usage Examples

import gymnasium as gym

env = gym.make("CartPole-v1")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Sutton-Barto Reward Variant

import gymnasium as gym

env = gym.make("CartPole-v1", sutton_barto_reward=True)
observation, info = env.reset(seed=42)

Vectorized Environment

import gymnasium as gym

envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="vector_entry_point")
observations, infos = envs.reset(seed=42)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment