Implementation:Farama Foundation Gymnasium InvertedDoublePendulumEnv V5
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, MuJoCo_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete implementation of the InvertedDoublePendulum v5 MuJoCo environment provided by Gymnasium.
Description
The InvertedDoublePendulum v5 environment originates from control theory and builds on the cartpole environment based on Barto, Sutton, and Anderson's work. It involves a cart that can be moved linearly, with one pole attached to it and a second pole attached to the end of the first pole. The goal is to balance the second pole on top of the first by applying continuous forces to the cart. The observation includes cart position, sine/cosine of pole angles, velocities, and the first constraint force (9 elements, reduced from 11 in v4 by excluding constant-zero constraint forces). The reward is: alive_bonus - distance_penalty - velocity_penalty, where alive_bonus is only given when not terminated (fixed from v4). The episode terminates when the y-coordinate of the second pole tip drops below 1.
Usage
Use this environment for benchmarking RL algorithms on a balance control task with continuous actions. The v5 version is recommended over v4 for new research, with fixes for the healthy_reward bug, reduced observation space, and detailed reward breakdowns in info.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File: gymnasium/envs/mujoco/inverted_double_pendulum_v5.py
Signature
class InvertedDoublePendulumEnv(MujocoEnv, utils.EzPickle):
def __init__(
self,
xml_file: str = "inverted_double_pendulum.xml",
frame_skip: int = 5,
default_camera_config: dict[str, float | int] = None,
healthy_reward: float = 10.0,
reset_noise_scale: float = 0.1,
**kwargs,
)
Import
import gymnasium as gym
env = gym.make("InvertedDoublePendulum-v5")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action | np.ndarray (1,) | Yes | Force applied on the cart, range [-1, 1] |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | np.ndarray (9,) | State vector: cart x pos (1), sin of angles (2), cos of angles (2), velocities (3), constraint force x (1) |
| reward | float | alive_bonus - dist_penalty - vel_penalty (alive_bonus only when not terminated) |
| terminated | bool | True if y-coordinate of second pole tip is less than or equal to 1 |
| truncated | bool | Episode truncation (handled by TimeLimit wrapper) |
| info | dict | Contains reward_survive, distance_penalty, velocity_penalty |
Usage Examples
import gymnasium as gym
env = gym.make("InvertedDoublePendulum-v5")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()