Overview
Concrete tool for the CartPole classic control environment provided by Gymnasium.
Description
This environment corresponds to the version of the cart-pole problem described by Barto, Sutton, and Anderson in "Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem". A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The pendulum is placed upright on the cart and the goal is to balance the pole by applying forces in the left and right direction on the cart.
The physics simulation uses Euler integration (configurable to semi-implicit Euler) with a timestep of 0.02 seconds. The cart has a mass of 1.0 kg, the pole has a mass of 0.1 kg and a half-length of 0.5 m, and gravity is set to 9.8 m/s^2. A fixed force of 10.0 N is applied to the cart in one of two directions at each step. The pole angle dynamics are computed using the full nonlinear equations of motion for the cart-pole system.
The episode terminates when the pole angle exceeds 12 degrees from vertical, when the cart position exceeds 2.4 units from center, or when the episode length exceeds 500 steps (for v1). A vectorized implementation (CartPoleVectorEnv) is also provided for high-throughput parallel simulation.
Usage
CartPole is one of the most widely used benchmark environments in reinforcement learning. It is commonly the first environment used to introduce RL concepts and to test new algorithm implementations. The environment is suitable for evaluating simple policy gradient methods, Q-learning, actor-critic methods, and other discrete-action RL algorithms. Its simplicity makes it ideal for educational purposes, debugging new algorithms, and rapid prototyping.
Code Reference
Source Location
Signature
class CartPoleEnv(gym.Env[np.ndarray, int | np.ndarray]):
def __init__(self, sutton_barto_reward: bool = False, render_mode: str | None = None):
Import
import gymnasium as gym
env = gym.make("CartPole-v1")
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| action |
int |
Yes |
Discrete action in {0, 1}: push cart to the left (0) or right (1)
|
Outputs
| Name |
Type |
Description
|
| observation |
np.ndarray (shape (4,), float32) |
[cart_position, cart_velocity, pole_angle, pole_angular_velocity]
|
| reward |
float |
+1.0 per step by default; 0.0/+1.0/-1.0 with sutton_barto_reward=True (see Reward section)
|
| terminated |
bool |
True when pole angle > 12 degrees or cart position > 2.4 units from center
|
| truncated |
bool |
False (truncation handled by TimeLimit wrapper; default 500 steps for v1)
|
| info |
dict |
Empty dictionary
|
Observation Space Details
| Index |
Observation |
Min |
Max
|
| 0 |
Cart Position |
-4.8 |
4.8
|
| 1 |
Cart Velocity |
-Inf |
Inf
|
| 2 |
Pole Angle |
-0.418 rad (-24 degrees) |
0.418 rad (24 degrees)
|
| 3 |
Pole Angular Velocity |
-Inf |
Inf
|
Action Space Details
| Value |
Action
|
| 0 |
Push cart to the left
|
| 1 |
Push cart to the right
|
Reward Details
| Mode |
Non-terminated Step |
Termination Step |
Post-termination Step
|
| Default (sutton_barto_reward=False) |
+1.0 |
+1.0 |
0.0
|
| Sutton-Barto (sutton_barto_reward=True) |
0.0 |
-1.0 |
-1.0
|
Key Methods
| Method |
Description
|
__init__(sutton_barto_reward=False, render_mode=None) |
Initializes the environment with observation space Box(4,), action space Discrete(2), physics parameters, and optional reward variant
|
reset(seed=None, options=None) |
Resets the state to random values in [-0.05, 0.05] (customizable via options "low"/"high"); returns (observation, info)
|
step(action) |
Applies force to the cart, integrates dynamics via Euler method, checks termination conditions, and returns (observation, reward, terminated, truncated, info)
|
render() |
Renders the environment using pygame in "human" or "rgb_array" mode
|
close() |
Closes the pygame display and cleans up resources
|
Physics Parameters
| Parameter |
Value |
Description
|
| gravity |
9.8 m/s^2 |
Gravitational acceleration
|
| masscart |
1.0 kg |
Mass of the cart
|
| masspole |
0.1 kg |
Mass of the pole
|
| length |
0.5 m |
Half the pole's length
|
| force_mag |
10.0 N |
Magnitude of force applied to the cart
|
| tau |
0.02 s |
Seconds between state updates (integration timestep)
|
| theta_threshold_radians |
0.2095 rad (12 degrees) |
Angle at which episode terminates
|
| x_threshold |
2.4 |
Cart position at which episode terminates
|
Usage Examples
import gymnasium as gym
env = gym.make("CartPole-v1")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()
Sutton-Barto Reward Variant
import gymnasium as gym
env = gym.make("CartPole-v1", sutton_barto_reward=True)
observation, info = env.reset(seed=42)
Vectorized Environment
import gymnasium as gym
envs = gym.make_vec("CartPole-v1", num_envs=3, vectorization_mode="vector_entry_point")
observations, infos = envs.reset(seed=42)
Related Pages