Implementation:Farama Foundation Gymnasium PendulumEnv

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, Classic_Control
Last Updated	2026-02-15 03:00 GMT

Overview

Concrete tool for the Pendulum classic control environment provided by Gymnasium.

Description

The inverted pendulum swingup problem is based on the classic problem in control theory. The system consists of a pendulum attached at one end to a fixed point, and the other end being free. The pendulum starts in a random position and the goal is to apply torque on the free end to swing it into an upright position, with its center of gravity right above the fixed point.

The dynamics are governed by the equation: angular_acceleration = (3*g / (2*l)) * sin(theta) + (3 / (m*l^2)) * torque, where g is gravity (default 10.0 m/s^2), m is mass (1.0 kg), and l is length (1.0 m). The integration uses Euler's method with a timestep of 0.05 seconds. The torque action is clipped to [-2.0, 2.0] N*m, and the angular velocity is clipped to [-8.0, 8.0] rad/s.

Unlike many other classic control environments, the Pendulum environment never terminates on its own -- it relies entirely on the TimeLimit wrapper for episode truncation (default 200 steps). The reward is a continuous function based on the angle, angular velocity, and applied torque: r = -(theta^2 + 0.1 * theta_dot^2 + 0.001 * torque^2), where theta is normalized to [-pi, pi]. The minimum possible reward per step is approximately -16.27, and the maximum is 0 (pendulum upright, no velocity, no torque).

Usage

This environment is commonly used for benchmarking continuous-action reinforcement learning algorithms. It is well-suited for testing policy gradient methods (REINFORCE, PPO, TRPO), actor-critic algorithms (A2C, SAC, TD3, DDPG), and model-based methods. The continuous reward signal makes it more amenable to gradient-based optimization than sparse-reward environments. It is also a standard testbed for demonstrating torque-limited swing-up control and for educational purposes in both RL and classical control theory courses.

Code Reference

Source Location

Repository: Farama_Foundation_Gymnasium
File: gymnasium/envs/classic_control/pendulum.py

Signature

class PendulumEnv(gym.Env):
    def __init__(self, render_mode: str | None = None, g=10.0):

Import

import gymnasium as gym
env = gym.make("Pendulum-v1")

I/O Contract

Inputs

Name	Type	Required	Description
action	np.ndarray (shape (1,), float32)	Yes	Torque applied to the free end of the pendulum, clipped to [-2.0, 2.0] N*m

Outputs

Name	Type	Description
observation	np.ndarray (shape (3,), float32)	[cos(theta), sin(theta), angular_velocity]
reward	float	-(theta^2 + 0.1 * theta_dot^2 + 0.001 * torque^2); ranges from approximately -16.27 to 0.0
terminated	bool	Always False (episode never terminates; relies on TimeLimit wrapper)
truncated	bool	False (truncation handled by TimeLimit wrapper; default 200 steps)
info	dict	Empty dictionary

Observation Space Details

Index	Observation	Min	Max
0	x = cos(theta)	-1.0	1.0
1	y = sin(theta)	-1.0	1.0
2	Angular Velocity	-8.0	8.0

Action Space Details

Dimension	Min	Max	Description
0	-2.0	2.0	Torque applied to the free end of the pendulum (N*m)

Key Methods

Method	Description
`__init__(render_mode=None, g=10.0)`	Initializes the environment with observation space Box(3,), continuous action space Box(1,) bounded to [-2, 2], and configurable gravity
`reset(seed=None, options=None)`	Resets theta to random value in [-pi, pi] and angular velocity in [-1, 1] (customizable via options "x_init"/"y_init"); returns (observation, info)
`step(u)`	Clips the torque action, computes reward, integrates dynamics via Euler method, and returns (observation, reward, terminated, truncated, info)
`render()`	Renders the environment using pygame in "human" or "rgb_array" mode, showing the pendulum rod, pivot, and torque direction indicator
`close()`	Closes the pygame display and cleans up resources

Physics Parameters

Parameter	Value	Description
g	10.0 m/s^2 (default)	Gravitational acceleration (configurable)
m	1.0 kg	Mass of the pendulum
l	1.0 m	Length of the pendulum
dt	0.05 s	Integration timestep
max_speed	8.0 rad/s	Maximum angular velocity
max_torque	2.0 N*m	Maximum applicable torque

Usage Examples

import gymnasium as gym

env = gym.make("Pendulum-v1")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Custom Gravity

import gymnasium as gym

env = gym.make("Pendulum-v1", render_mode="rgb_array", g=9.81)
observation, info = env.reset(seed=123, options={"x_init": 1.5, "y_init": 0.5})

Related Pages

Environment:Farama_Foundation_Gymnasium_Python_3_10_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment