Overview
Concrete tool for the Pendulum classic control environment provided by Gymnasium.
Description
The inverted pendulum swingup problem is based on the classic problem in control theory. The system consists of a pendulum attached at one end to a fixed point, and the other end being free. The pendulum starts in a random position and the goal is to apply torque on the free end to swing it into an upright position, with its center of gravity right above the fixed point.
The dynamics are governed by the equation: angular_acceleration = (3*g / (2*l)) * sin(theta) + (3 / (m*l^2)) * torque, where g is gravity (default 10.0 m/s^2), m is mass (1.0 kg), and l is length (1.0 m). The integration uses Euler's method with a timestep of 0.05 seconds. The torque action is clipped to [-2.0, 2.0] N*m, and the angular velocity is clipped to [-8.0, 8.0] rad/s.
Unlike many other classic control environments, the Pendulum environment never terminates on its own -- it relies entirely on the TimeLimit wrapper for episode truncation (default 200 steps). The reward is a continuous function based on the angle, angular velocity, and applied torque: r = -(theta^2 + 0.1 * theta_dot^2 + 0.001 * torque^2), where theta is normalized to [-pi, pi]. The minimum possible reward per step is approximately -16.27, and the maximum is 0 (pendulum upright, no velocity, no torque).
Usage
This environment is commonly used for benchmarking continuous-action reinforcement learning algorithms. It is well-suited for testing policy gradient methods (REINFORCE, PPO, TRPO), actor-critic algorithms (A2C, SAC, TD3, DDPG), and model-based methods. The continuous reward signal makes it more amenable to gradient-based optimization than sparse-reward environments. It is also a standard testbed for demonstrating torque-limited swing-up control and for educational purposes in both RL and classical control theory courses.
Code Reference
Source Location
Signature
class PendulumEnv(gym.Env):
def __init__(self, render_mode: str | None = None, g=10.0):
Import
import gymnasium as gym
env = gym.make("Pendulum-v1")
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| action |
np.ndarray (shape (1,), float32) |
Yes |
Torque applied to the free end of the pendulum, clipped to [-2.0, 2.0] N*m
|
Outputs
| Name |
Type |
Description
|
| observation |
np.ndarray (shape (3,), float32) |
[cos(theta), sin(theta), angular_velocity]
|
| reward |
float |
-(theta^2 + 0.1 * theta_dot^2 + 0.001 * torque^2); ranges from approximately -16.27 to 0.0
|
| terminated |
bool |
Always False (episode never terminates; relies on TimeLimit wrapper)
|
| truncated |
bool |
False (truncation handled by TimeLimit wrapper; default 200 steps)
|
| info |
dict |
Empty dictionary
|
Observation Space Details
| Index |
Observation |
Min |
Max
|
| 0 |
x = cos(theta) |
-1.0 |
1.0
|
| 1 |
y = sin(theta) |
-1.0 |
1.0
|
| 2 |
Angular Velocity |
-8.0 |
8.0
|
Action Space Details
| Dimension |
Min |
Max |
Description
|
| 0 |
-2.0 |
2.0 |
Torque applied to the free end of the pendulum (N*m)
|
Key Methods
| Method |
Description
|
__init__(render_mode=None, g=10.0) |
Initializes the environment with observation space Box(3,), continuous action space Box(1,) bounded to [-2, 2], and configurable gravity
|
reset(seed=None, options=None) |
Resets theta to random value in [-pi, pi] and angular velocity in [-1, 1] (customizable via options "x_init"/"y_init"); returns (observation, info)
|
step(u) |
Clips the torque action, computes reward, integrates dynamics via Euler method, and returns (observation, reward, terminated, truncated, info)
|
render() |
Renders the environment using pygame in "human" or "rgb_array" mode, showing the pendulum rod, pivot, and torque direction indicator
|
close() |
Closes the pygame display and cleans up resources
|
Physics Parameters
| Parameter |
Value |
Description
|
| g |
10.0 m/s^2 (default) |
Gravitational acceleration (configurable)
|
| m |
1.0 kg |
Mass of the pendulum
|
| l |
1.0 m |
Length of the pendulum
|
| dt |
0.05 s |
Integration timestep
|
| max_speed |
8.0 rad/s |
Maximum angular velocity
|
| max_torque |
2.0 N*m |
Maximum applicable torque
|
Usage Examples
import gymnasium as gym
env = gym.make("Pendulum-v1")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()
Custom Gravity
import gymnasium as gym
env = gym.make("Pendulum-v1", render_mode="rgb_array", g=9.81)
observation, info = env.reset(seed=123, options={"x_init": 1.5, "y_init": 0.5})
Related Pages