Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium InvertedDoublePendulumEnv V5

From Leeroopedia
Revision as of 12:37, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Farama_Foundation_Gymnasium_InvertedDoublePendulumEnv_V5.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Reinforcement_Learning, MuJoCo_Environments
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete implementation of the InvertedDoublePendulum v5 MuJoCo environment provided by Gymnasium.

Description

The InvertedDoublePendulum v5 environment originates from control theory and builds on the cartpole environment based on Barto, Sutton, and Anderson's work. It involves a cart that can be moved linearly, with one pole attached to it and a second pole attached to the end of the first pole. The goal is to balance the second pole on top of the first by applying continuous forces to the cart. The observation includes cart position, sine/cosine of pole angles, velocities, and the first constraint force (9 elements, reduced from 11 in v4 by excluding constant-zero constraint forces). The reward is: alive_bonus - distance_penalty - velocity_penalty, where alive_bonus is only given when not terminated (fixed from v4). The episode terminates when the y-coordinate of the second pole tip drops below 1.

Usage

Use this environment for benchmarking RL algorithms on a balance control task with continuous actions. The v5 version is recommended over v4 for new research, with fixes for the healthy_reward bug, reduced observation space, and detailed reward breakdowns in info.

Code Reference

Source Location

Signature

class InvertedDoublePendulumEnv(MujocoEnv, utils.EzPickle):
    def __init__(
        self,
        xml_file: str = "inverted_double_pendulum.xml",
        frame_skip: int = 5,
        default_camera_config: dict[str, float | int] = None,
        healthy_reward: float = 10.0,
        reset_noise_scale: float = 0.1,
        **kwargs,
    )

Import

import gymnasium as gym
env = gym.make("InvertedDoublePendulum-v5")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (1,) Yes Force applied on the cart, range [-1, 1]

Outputs

Name Type Description
observation np.ndarray (9,) State vector: cart x pos (1), sin of angles (2), cos of angles (2), velocities (3), constraint force x (1)
reward float alive_bonus - dist_penalty - vel_penalty (alive_bonus only when not terminated)
terminated bool True if y-coordinate of second pole tip is less than or equal to 1
truncated bool Episode truncation (handled by TimeLimit wrapper)
info dict Contains reward_survive, distance_penalty, velocity_penalty

Usage Examples

import gymnasium as gym

env = gym.make("InvertedDoublePendulum-v5")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment