Implementation:Farama Foundation Gymnasium InvertedDoublePendulumEnv V4
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, MuJoCo_Environments |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
Concrete implementation of the InvertedDoublePendulum v4 MuJoCo environment provided by Gymnasium.
Description
The InvertedDoublePendulum v4 environment is a cartpole variant powered by MuJoCo. It involves a cart that can be moved linearly, with one pole attached to it and a second pole attached to the end of the first pole. The cart can be pushed left or right, and the goal is to balance the second pole on top of the first pole by applying continuous forces to the cart. The observation includes the cart position, sine/cosine of pole angles, velocities, and constraint forces (11 elements). The reward is: alive_bonus (10) - distance_penalty - velocity_penalty. The episode terminates when the y-coordinate of the tip of the second pole drops below 1. This is the legacy version using MuJoCo bindings (mujoco >= 2.1.3).
Usage
Use this environment for benchmarking RL algorithms on a balance control task. For new research, consider InvertedDoublePendulum-v5 which fixes the healthy_reward bug, removes constant-zero constraint forces from observations, and provides detailed reward info.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File: gymnasium/envs/mujoco/inverted_double_pendulum_v4.py
Signature
class InvertedDoublePendulumEnv(MujocoEnv, utils.EzPickle):
def __init__(self, **kwargs)
Import
import gymnasium as gym
env = gym.make("InvertedDoublePendulum-v4")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| action | np.ndarray (1,) | Yes | Force applied on the cart, range [-1, 1] |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | np.ndarray (11,) | State vector: cart x pos (1), sin of angles (2), cos of angles (2), velocities (3), constraint forces (3) |
| reward | float | alive_bonus (10) - dist_penalty - vel_penalty |
| terminated | bool | True if y-coordinate of second pole tip is less than or equal to 1 |
| truncated | bool | Episode truncation (handled by TimeLimit wrapper) |
| info | dict | Empty dictionary |
Usage Examples
import gymnasium as gym
env = gym.make("InvertedDoublePendulum-v4")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()