Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium InvertedPendulumEnv V5

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the InvertedPendulum v5 MuJoCo environment provided by Gymnasium.

Description

The InvertedPendulum v5 environment is the Cartpole environment powered by MuJoCo, based on Barto, Sutton, and Anderson's work in "Neuronlike adaptive elements that can solve difficult learning control problems". It consists of a cart that can be moved linearly, with a pole attached to one end. The goal is to balance the pole on top of the cart by applying forces. The observation space is 4-dimensional (cart position, pole angle, cart velocity, pole angular velocity). The reward is +1 for each timestep the pole remains upright (angle less than 0.2 radians). The episode terminates if any observation is non-finite or the pole angle exceeds 0.2 radians.

Usage

Use this environment as a simple MuJoCo control benchmark. It serves as a good starting point for verifying RL algorithm implementations before moving to more complex environments. The v5 version adds configurable reset_noise_scale, custom xml_file support, and reward info.

Code Reference

Source Location

Signature

class InvertedPendulumEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        xml_file: str = "inverted_pendulum.xml",
        frame_skip: int = 2,
        default_camera_config: dict[str, float | int] = DEFAULT_CAMERA_CONFIG,
        reset_noise_scale: float = 0.01,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("InvertedPendulum-v5")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (1,) Yes Force applied on the cart, range [-3, 3]

Outputs

Name Type Description
observation np.ndarray (4,) State vector: cart position, pole angle, cart velocity, pole angular velocity
reward float 1.0 if not terminated, 0.0 otherwise
terminated bool True if observation is non-finite or pole angle exceeds 0.2 radians
truncated bool Episode truncation (handled by TimeLimit wrapper, default 1000 timesteps)
info dict Contains reward_survive

Usage Examples

import gymnasium as gym

env = gym.make("InvertedPendulum-v5")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment