Implementation:Google deepmind Dm control Suite Pendulum
| Metadata | Value |
|---|---|
| Implementation | Suite Pendulum |
| Domain | Reinforcement_Learning, Control |
| Source | Google_deepmind_Dm_control |
| Last Updated | 2026-02-15 04:00 GMT |
Overview
Concrete tool for swinging up and balancing a single inverted pendulum provided by the dm_control Control Suite.
Description
The Pendulum domain implements the classic single-link pendulum swing-up problem. A pole is attached to a fixed pivot via a hinge joint and is actuated by a torque applied at the hinge. The Physics subclass provides methods for reading the vertical component of the pole frame (pole_vertical), the angular velocity of the pole (angular_velocity), and the full pole orientation as both horizontal and vertical components (pole_orientation).
A single benchmarking task, swingup, is registered. The SwingUp task class initializes each episode by setting the hinge angle to a random value in [-pi, pi). Observations consist of the pole orientation (vertical and horizontal components) and the angular velocity. The reward uses a tolerance function that returns 1 when the pole's vertical component (cosine of the angle from vertical) is within the bound defined by an 8-degree threshold, meaning the pole must be within 8 degrees of the upright position for full reward.
The default time limit is 20 seconds.
Usage
Use this implementation for the fundamental single-pendulum swing-up benchmark. Load via suite.load(domain_name='pendulum', task_name='swingup').
Code Reference
Source Location
- Repository: Google_deepmind_Dm_control
- File: dm_control/suite/pendulum.py
- Lines: 1-110
Signature
# Task factory function
def swingup(time_limit=20, random=None, environment_kwargs=None)
# Physics subclass
class Physics(mujoco.Physics):
def pole_vertical(self) # vertical (z) component of pole frame
def angular_velocity(self) # angular velocity of the hinge
def pole_orientation(self) # (zz, xz) components of pole frame
# Task class
class SwingUp(base.Task):
def __init__(self, random=None)
def initialize_episode(self, physics)
def get_observation(self, physics)
def get_reward(self, physics)
Import
from dm_control import suite
env = suite.load(domain_name='pendulum', task_name='swingup')
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
time_limit |
float | No | Maximum episode duration in seconds (default 20). |
random |
int, numpy.random.RandomState, or None | No | Random seed or RNG instance for reproducibility. |
environment_kwargs |
dict or None | No | Additional keyword arguments forwarded to the Environment constructor.
|
Outputs
| Name | Type | Description |
|---|---|---|
| environment | dm_control.rl.control.Environment |
A fully initialised environment conforming to the dm_env.Environment interface.
|
Observations
| Key | Type | Description |
|---|---|---|
orientation |
numpy array (2,) | Vertical and horizontal components of the pole frame (zz, xz). |
velocity |
numpy array (1,) | Angular velocity of the hinge joint. |
Usage Examples
from dm_control import suite
# Load the pendulum swingup task
env = suite.load(domain_name='pendulum', task_name='swingup')
# Run an episode
time_step = env.reset()
while not time_step.last():
action = env.action_spec().generate_value()
time_step = env.step(action)
print(f"Reward: {time_step.reward:.3f}")