Implementation:Google deepmind Dm control Suite Pendulum

Metadata	Value
Implementation	Suite Pendulum
Domain	Reinforcement_Learning, Control
Source	Google_deepmind_Dm_control
Last Updated	2026-02-15 04:00 GMT

Overview

Concrete tool for swinging up and balancing a single inverted pendulum provided by the dm_control Control Suite.

Description

The Pendulum domain implements the classic single-link pendulum swing-up problem. A pole is attached to a fixed pivot via a hinge joint and is actuated by a torque applied at the hinge. The Physics subclass provides methods for reading the vertical component of the pole frame (pole_vertical), the angular velocity of the pole (angular_velocity), and the full pole orientation as both horizontal and vertical components (pole_orientation).

A single benchmarking task, swingup, is registered. The SwingUp task class initializes each episode by setting the hinge angle to a random value in [-pi, pi). Observations consist of the pole orientation (vertical and horizontal components) and the angular velocity. The reward uses a tolerance function that returns 1 when the pole's vertical component (cosine of the angle from vertical) is within the bound defined by an 8-degree threshold, meaning the pole must be within 8 degrees of the upright position for full reward.

The default time limit is 20 seconds.

Usage

Use this implementation for the fundamental single-pendulum swing-up benchmark. Load via suite.load(domain_name='pendulum', task_name='swingup').

Code Reference

Source Location

Repository: Google_deepmind_Dm_control
File: dm_control/suite/pendulum.py
Lines: 1-110

Signature

# Task factory function
def swingup(time_limit=20, random=None, environment_kwargs=None)

# Physics subclass
class Physics(mujoco.Physics):
    def pole_vertical(self)       # vertical (z) component of pole frame
    def angular_velocity(self)    # angular velocity of the hinge
    def pole_orientation(self)    # (zz, xz) components of pole frame

# Task class
class SwingUp(base.Task):
    def __init__(self, random=None)
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='pendulum', task_name='swingup')

I/O Contract

Inputs

Name	Type	Required	Description
`time_limit`	float	No	Maximum episode duration in seconds (default 20).
`random`	int, numpy.random.RandomState, or None	No	Random seed or RNG instance for reproducibility.
`environment_kwargs`	dict or None	No	Additional keyword arguments forwarded to the `Environment` constructor.

Outputs

Name	Type	Description
environment	`dm_control.rl.control.Environment`	A fully initialised environment conforming to the `dm_env.Environment` interface.

Observations

Key	Type	Description
`orientation`	numpy array (2,)	Vertical and horizontal components of the pole frame (zz, xz).
`velocity`	numpy array (1,)	Angular velocity of the hinge joint.

Usage Examples

from dm_control import suite

# Load the pendulum swingup task
env = suite.load(domain_name='pendulum', task_name='swingup')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)
    print(f"Reward: {time_step.reward:.3f}")

Related Pages

Principle:Google_deepmind_Dm_control_Control_Suite_Environment_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment