Implementation:Farama Foundation Gymnasium CartPoleEnv

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, Classic_Control
Last Updated	2026-02-15 03:00 GMT

Overview

Concrete tool for the CartPole classic control environment provided by Gymnasium.

Description

This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in "Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem". A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces in the left and right direction on the cart.

The physics simulation uses Euler integration (configurable to semi-implicit Euler) with a timestep of 0.02 seconds. The cart has a mass of 1.0 kg, the pole has a mass of 0.1 kg and a half-length of 0.5 m, and gravity is set to 9.8 m/s^2. A fixed force of 10.0 N is applied to the cart in one of two directions at each step. The pole angle dynamics are computed using the full nonlinear equations of motion for the cart-pole system.

The episode terminates when the pole angle exceeds 12 degrees from vertical, when the cart position exceeds 2.4 units from center, or when the episode length exceeds 500 steps (for v1). A vectorized implementation (CartPoleVectorEnv) is also provided for high-throughput parallel simulation.

Usage

CartPole is one of the most widely used benchmark environments in reinforcement learning. It is commonly the first environment used to introduce RL concepts and to test new algorithm implementations. The environment is suitable for evaluating simple policy gradient methods, Q-learning, actor-critic methods, and other discrete-action RL algorithms. Its simplicity makes it ideal for educational purposes, debugging new algorithms, and rapid prototyping.

Code Reference

Source Location

Repository: Farama_Foundation_Gymnasium
File: gymnasium/envs/classic_control/cartpole.py

Signature

class CartPoleEnv(gym.Env[np.ndarray, int | np.ndarray]):
    def __init__(self, sutton_barto_reward: bool = False, render_mode: str | None = None):

Import

import gymnasium as gym
env = gym.make("CartPole-v1")

I/O Contract

Inputs

Name	Type	Required	Description
action	int	Yes	Discrete action in {0, 1}: push cart to the left (0) or right (1)

Outputs

Name	Type	Description
observation	np.ndarray (shape (4,), float32)	[cart_position, cart_velocity, pole_angle, pole_angular_velocity]
reward	float	+1.0 per step by default; 0.0/+1.0/-1.0 with sutton_barto_reward=True (see Reward section)
terminated	bool	True when pole angle > 12 degrees or cart position > 2.4 units from center
truncated	bool	False (truncation handled by TimeLimit wrapper; default 500 steps for v1)
info	dict	Empty dictionary

Observation Space Details

Index	Observation	Min	Max
0	Cart Position	-4.8	4.8
1	Cart Velocity	-Inf	Inf
2	Pole Angle	-0.418 rad (-24 degrees)	0.418 rad (24 degrees)
3	Pole Angular Velocity	-Inf	Inf

Action Space Details

Value	Action
0	Push cart to the left
1	Push cart to the right

Reward Details

Mode	Non-terminated Step	Termination Step	Post-termination Step
Default (sutton_barto_reward=False)	+1.0	+1.0	0.0
Sutton-Barto (sutton_barto_reward=True)	0.0	-1.0	-1.0

Key Methods

Method	Description
`__init__(sutton_barto_reward=False, render_mode=None)`	Initializes the environment with observation space Box(4,), action space Discrete(2), physics parameters, and optional reward variant
`reset(seed=None, options=None)`	Resets the state to random values in [-0.05, 0.05] (customizable via options "low"/"high"); returns (observation, info)
`step(action)`	Applies force to the cart, integrates dynamics via Euler method, checks termination conditions, and returns (observation, reward, terminated, truncated, info)
`render()`	Renders the environment using pygame in "human" or "rgb_array" mode
`close()`	Closes the pygame display and cleans up resources

Physics Parameters

Parameter	Value	Description
gravity	9.8 m/s^2	Gravitational acceleration
masscart	1.0 kg	Mass of the cart
masspole	0.1 kg	Mass of the pole
length	0.5 m	Half the pole's length
force_mag	10.0 N	Magnitude of force applied to the cart
tau	0.02 s	Seconds between state updates (integration timestep)
theta_threshold_radians	0.2095 rad (12 degrees)	Angle at which episode terminates
x_threshold	2.4	Cart position at which episode terminates

Usage Examples

import gymnasium as gym

env = gym.make("CartPole-v1")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Sutton-Barto Reward Variant

import gymnasium as gym

env = gym.make("CartPole-v1", sutton_barto_reward=True)
observation, info = env.reset(seed=42)

Vectorized Environment

import gymnasium as gym

envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="vector_entry_point")
observations, infos = envs.reset(seed=42)

Related Pages

Environment:Farama_Foundation_Gymnasium_Python_3_10_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment