Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Google deepmind Dm control Suite Pendulum

From Leeroopedia
Metadata Value
Implementation Suite Pendulum
Domain Reinforcement_Learning, Control
Source Google_deepmind_Dm_control
Last Updated 2026-02-15 04:00 GMT

Overview

Concrete tool for swinging up and balancing a single inverted pendulum provided by the dm_control Control Suite.

Description

The Pendulum domain implements the classic single-link pendulum swing-up problem. A pole is attached to a fixed pivot via a hinge joint and is actuated by a torque applied at the hinge. The Physics subclass provides methods for reading the vertical component of the pole frame (pole_vertical), the angular velocity of the pole (angular_velocity), and the full pole orientation as both horizontal and vertical components (pole_orientation).

A single benchmarking task, swingup, is registered. The SwingUp task class initializes each episode by setting the hinge angle to a random value in [-pi, pi). Observations consist of the pole orientation (vertical and horizontal components) and the angular velocity. The reward uses a tolerance function that returns 1 when the pole's vertical component (cosine of the angle from vertical) is within the bound defined by an 8-degree threshold, meaning the pole must be within 8 degrees of the upright position for full reward.

The default time limit is 20 seconds.

Usage

Use this implementation for the fundamental single-pendulum swing-up benchmark. Load via suite.load(domain_name='pendulum', task_name='swingup').

Code Reference

Source Location

Signature

# Task factory function
def swingup(time_limit=20, random=None, environment_kwargs=None)

# Physics subclass
class Physics(mujoco.Physics):
    def pole_vertical(self)       # vertical (z) component of pole frame
    def angular_velocity(self)    # angular velocity of the hinge
    def pole_orientation(self)    # (zz, xz) components of pole frame

# Task class
class SwingUp(base.Task):
    def __init__(self, random=None)
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='pendulum', task_name='swingup')

I/O Contract

Inputs

Name Type Required Description
time_limit float No Maximum episode duration in seconds (default 20).
random int, numpy.random.RandomState, or None No Random seed or RNG instance for reproducibility.
environment_kwargs dict or None No Additional keyword arguments forwarded to the Environment constructor.

Outputs

Name Type Description
environment dm_control.rl.control.Environment A fully initialised environment conforming to the dm_env.Environment interface.

Observations

Key Type Description
orientation numpy array (2,) Vertical and horizontal components of the pole frame (zz, xz).
velocity numpy array (1,) Angular velocity of the hinge joint.

Usage Examples

from dm_control import suite

# Load the pendulum swingup task
env = suite.load(domain_name='pendulum', task_name='swingup')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)
    print(f"Reward: {time_step.reward:.3f}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment