Overview
Concrete tool for the Acrobot classic control environment provided by Gymnasium.
Description
The Acrobot environment is based on Sutton's work in "Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding" and Sutton and Barto's book "Reinforcement Learning: An Introduction". The system consists of two links connected linearly to form a chain, with one end of the chain fixed. The joint between the two links is actuated. The goal is to apply torques on the actuated joint to swing the free end of the linear chain above a given height while starting from the initial state of hanging downwards.
The dynamics are governed by rigid-body equations with two rotational degrees of freedom (theta1 and theta2), integrated using a 4th-order Runge-Kutta method with a timestep of 0.2 seconds. The system models two links of equal length (1.0 m) and equal mass (1.0 kg), with gravity set to 9.8 m/s^2. The agent can apply one of three discrete torques (-1, 0, or +1 N*m) to the actuated joint at each step.
By default, the dynamics follow those described in Sutton and Barto's book. An alternative dynamics formulation from the original NeurIPS paper can be activated by setting the book_or_nips attribute to "nips" on the unwrapped environment.
Usage
This environment is commonly used for benchmarking reinforcement learning algorithms on sparse reward, continuous state-space control problems. It serves as a standard testbed for evaluating discrete-action RL agents, particularly for algorithms that must learn to chain together sequences of actions to achieve a goal. The Acrobot is well-suited for educational purposes, demonstrating concepts such as underactuated control, swing-up tasks, and energy-based control strategies.
Code Reference
Source Location
Signature
class AcrobotEnv(Env):
def __init__(self, render_mode: str | None = None):
Import
import gymnasium as gym
env = gym.make("Acrobot-v1")
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| action |
int |
Yes |
Discrete action in {0, 1, 2}: apply -1, 0, or +1 torque (N*m) to the actuated joint
|
Outputs
| Name |
Type |
Description
|
| observation |
np.ndarray (shape (6,), float32) |
[cos(theta1), sin(theta1), cos(theta2), sin(theta2), angular_velocity1, angular_velocity2]
|
| reward |
float |
-1.0 for each step that does not reach the goal; 0.0 upon termination at the goal
|
| terminated |
bool |
True when the free end reaches the target height: -cos(theta1) - cos(theta2 + theta1) > 1.0
|
| truncated |
bool |
False (truncation handled by TimeLimit wrapper; default 500 steps for v1)
|
| info |
dict |
Empty dictionary
|
Observation Space Details
| Index |
Observation |
Min |
Max
|
| 0 |
Cosine of theta1 |
-1.0 |
1.0
|
| 1 |
Sine of theta1 |
-1.0 |
1.0
|
| 2 |
Cosine of theta2 |
-1.0 |
1.0
|
| 3 |
Sine of theta2 |
-1.0 |
1.0
|
| 4 |
Angular velocity of theta1 |
-12.567 (-4*pi) |
12.567 (4*pi)
|
| 5 |
Angular velocity of theta2 |
-28.274 (-9*pi) |
28.274 (9*pi)
|
Action Space Details
| Value |
Action |
Unit
|
| 0 |
Apply -1 torque to the actuated joint |
N*m
|
| 1 |
Apply 0 torque to the actuated joint |
N*m
|
| 2 |
Apply +1 torque to the actuated joint |
N*m
|
Key Methods
| Method |
Description
|
__init__(render_mode=None) |
Initializes the environment with observation space Box(6,), action space Discrete(3), and optional rendering
|
reset(seed=None, options=None) |
Resets the state to random values in [-0.1, 0.1] (customizable via options "low"/"high"); returns (observation, info)
|
step(a) |
Applies the selected torque, integrates dynamics via RK4, and returns (observation, reward, terminated, truncated, info)
|
render() |
Renders the environment using pygame in "human" or "rgb_array" mode
|
close() |
Closes the pygame display and cleans up resources
|
Physics Parameters
| Parameter |
Value |
Description
|
| LINK_LENGTH_1 |
1.0 m |
Length of link 1
|
| LINK_LENGTH_2 |
1.0 m |
Length of link 2
|
| LINK_MASS_1 |
1.0 kg |
Mass of link 1
|
| LINK_MASS_2 |
1.0 kg |
Mass of link 2
|
| LINK_COM_POS_1 |
0.5 m |
Center of mass position of link 1
|
| LINK_COM_POS_2 |
0.5 m |
Center of mass position of link 2
|
| LINK_MOI |
1.0 |
Moments of inertia for both links
|
| dt |
0.2 s |
Integration timestep
|
| MAX_VEL_1 |
4*pi rad/s |
Maximum angular velocity for joint 1
|
| MAX_VEL_2 |
9*pi rad/s |
Maximum angular velocity for joint 2
|
Usage Examples
import gymnasium as gym
env = gym.make("Acrobot-v1")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()
Custom Reset Bounds
import gymnasium as gym
env = gym.make("Acrobot-v1", render_mode="rgb_array")
observation, info = env.reset(seed=123, options={"low": -0.2, "high": 0.2})
Switching Dynamics Model
import gymnasium as gym
env = gym.make("Acrobot-v1")
env.unwrapped.book_or_nips = "nips" # Use NeurIPS paper dynamics instead of book dynamics
observation, info = env.reset()
Related Pages