Implementation:Google deepmind Dm control Suite Acrobot

Metadata	Value
Implementation	Suite Acrobot
Domain	Reinforcement_Learning, Control
Source	Google_deepmind_Dm_control
Last Updated	2026-02-15 04:00 GMT

Overview

Concrete tool for swinging up and balancing a two-link planar acrobot provided by the dm_control Control Suite.

Description

The Acrobot domain models a two-link pendulum (an underactuated system) where only the elbow joint is actuated and the goal is to swing the tip of the lower arm up to a target location. The domain defines a custom Physics subclass that exposes helper methods for reading the horizontal and vertical components of the upper and lower arm body frames, the distance from the tip to the target, and the full pole orientations (sines and cosines of joint angles).

Two benchmark tasks are registered in the domain's SUITE: swingup and swingup_sparse. Both use the Balance task class, which initializes the shoulder and elbow joints to random angles in [-pi, pi) and returns observations of orientations and velocities. The difference between the two tasks lies in the reward signal: swingup uses a smooth tolerance-based reward that falls off with a margin of 1 as the tip moves away from the target, while swingup_sparse uses a sparse (indicator) reward that is 1 only when the tip is within the target radius.

Both tasks share a default time limit of 10 seconds. The reward is computed via the rewards.tolerance utility function, which measures how close the tip-to-target distance is to zero.

Usage

Use this implementation when you need a classic underactuated swing-up control benchmark. Load it via suite.load(domain_name='acrobot', task_name='swingup') or suite.load(domain_name='acrobot', task_name='swingup_sparse').

Code Reference

Source Location

Repository: Google_deepmind_Dm_control
File: dm_control/suite/acrobot.py
Lines: 1-123

Signature

# Task factory functions
def swingup(time_limit=10, random=None, environment_kwargs=None)
def swingup_sparse(time_limit=10, random=None, environment_kwargs=None)

# Physics subclass
class Physics(mujoco.Physics):
    def horizontal(self)        # x component of body frame z-axes
    def vertical(self)          # z component of body frame z-axes
    def to_target(self)         # distance from tip to target
    def orientations(self)      # sines and cosines of pole angles

# Task class
class Balance(base.Task):
    def __init__(self, sparse, random=None)
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='acrobot', task_name='swingup')

I/O Contract

Inputs

Name	Type	Required	Description
`time_limit`	float	No	Maximum episode duration in seconds (default 10).
`random`	int, numpy.random.RandomState, or None	No	Random seed or RNG instance for reproducibility.
`environment_kwargs`	dict or None	No	Additional keyword arguments forwarded to the `Environment` constructor.

Outputs

Name	Type	Description
environment	`dm_control.rl.control.Environment`	A fully initialised environment conforming to the `dm_env.Environment` interface.

Observations

Key	Type	Description
`orientations`	numpy array (4,)	Sines and cosines of the upper and lower arm angles.
`velocity`	numpy array	Angular velocities of the joints.

Usage Examples

from dm_control import suite

# Load the smooth-reward swingup task
env = suite.load(domain_name='acrobot', task_name='swingup')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)

# Load the sparse-reward variant
env_sparse = suite.load(domain_name='acrobot', task_name='swingup_sparse')

Related Pages

Principle:Google_deepmind_Dm_control_Control_Suite_Environment_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment