Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium PendulumEnv

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Classic_Control
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete tool for the Pendulum classic control environment provided by Gymnasium.

Description

The inverted pendulum swingup problem is based on the classic problem in control theory. The system consists of a pendulum attached at one end to a fixed point, and the other end being free. The pendulum starts in a random position and the goal is to apply torque on the free end to swing it into an upright position, with its center of gravity right above the fixed point.

The dynamics are governed by the equation: angular_acceleration = (3*g / (2*l)) * sin(theta) + (3 / (m*l^2)) * torque, where g is gravity (default 10.0 m/s^2), m is mass (1.0 kg), and l is length (1.0 m). The integration uses Euler's method with a timestep of 0.05 seconds. The torque action is clipped to [-2.0, 2.0] N*m, and the angular velocity is clipped to [-8.0, 8.0] rad/s.

Unlike many other classic control environments, the Pendulum environment never terminates on its own -- it relies entirely on the TimeLimit wrapper for episode truncation (default 200 steps). The reward is a continuous function based on the angle, angular velocity, and applied torque: r = -(theta^2 + 0.1 * theta_dot^2 + 0.001 * torque^2), where theta is normalized to [-pi, pi]. The minimum possible reward per step is approximately -16.27, and the maximum is 0 (pendulum upright, no velocity, no torque).

Usage

This environment is commonly used for benchmarking continuous-action reinforcement learning algorithms. It is well-suited for testing policy gradient methods (REINFORCE, PPO, TRPO), actor-critic algorithms (A2C, SAC, TD3, DDPG), and model-based methods. The continuous reward signal makes it more amenable to gradient-based optimization than sparse-reward environments. It is also a standard testbed for demonstrating torque-limited swing-up control and for educational purposes in both RL and classical control theory courses.

Code Reference

Source Location

Signature

class PendulumEnv(gym.Env):
    def __init__(self, render_mode: str | None = None, g=10.0):

Import

import gymnasium as gym
env = gym.make("Pendulum-v1")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (shape (1,), float32) Yes Torque applied to the free end of the pendulum, clipped to [-2.0, 2.0] N*m

Outputs

Name Type Description
observation np.ndarray (shape (3,), float32) [cos(theta), sin(theta), angular_velocity]
reward float -(theta^2 + 0.1 * theta_dot^2 + 0.001 * torque^2); ranges from approximately -16.27 to 0.0
terminated bool Always False (episode never terminates; relies on TimeLimit wrapper)
truncated bool False (truncation handled by TimeLimit wrapper; default 200 steps)
info dict Empty dictionary

Observation Space Details

Index Observation Min Max
0 x = cos(theta) -1.0 1.0
1 y = sin(theta) -1.0 1.0
2 Angular Velocity -8.0 8.0

Action Space Details

Dimension Min Max Description
0 -2.0 2.0 Torque applied to the free end of the pendulum (N*m)

Key Methods

Method Description
__init__(render_mode=None, g=10.0) Initializes the environment with observation space Box(3,), continuous action space Box(1,) bounded to [-2, 2], and configurable gravity
reset(seed=None, options=None) Resets theta to random value in [-pi, pi] and angular velocity in [-1, 1] (customizable via options "x_init"/"y_init"); returns (observation, info)
step(u) Clips the torque action, computes reward, integrates dynamics via Euler method, and returns (observation, reward, terminated, truncated, info)
render() Renders the environment using pygame in "human" or "rgb_array" mode, showing the pendulum rod, pivot, and torque direction indicator
close() Closes the pygame display and cleans up resources

Physics Parameters

Parameter Value Description
g 10.0 m/s^2 (default) Gravitational acceleration (configurable)
m 1.0 kg Mass of the pendulum
l 1.0 m Length of the pendulum
dt 0.05 s Integration timestep
max_speed 8.0 rad/s Maximum angular velocity
max_torque 2.0 N*m Maximum applicable torque

Usage Examples

import gymnasium as gym

env = gym.make("Pendulum-v1")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Custom Gravity

import gymnasium as gym

env = gym.make("Pendulum-v1", render_mode="rgb_array", g=9.81)
observation, info = env.reset(seed=123, options={"x_init": 1.5, "y_init": 0.5})

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment