Implementation:Google deepmind Dm control Suite Cartpole

Metadata	Value
Implementation	Suite Cartpole
Domain	Reinforcement_Learning, Control
Source	Google_deepmind_Dm_control
Last Updated	2026-02-15 04:00 GMT

Overview

Concrete tool for balancing and swinging up one or more poles on a sliding cart provided by the dm_control Control Suite.

Description

The Cartpole domain implements the classic cart-pole control problem using MuJoCo physics. A cart slides along a rail and one or more poles are attached via hinge joints. The domain supports procedural model generation: while the base model defines a single-pole system, the _make_model helper can dynamically extend the XML to create chains of two, three, or more poles by inserting additional body elements.

The Physics subclass provides methods for reading the cart position, the angular velocities of the poles, the cosine of pole angles (used for uprightness checks), and a bounded position representation that splits pole angles into sine/cosine components. Four benchmarking tasks are defined: balance, balance_sparse, swingup, and swingup_sparse. Two additional non-benchmarking tasks, two_poles and three_poles, test more challenging multi-pole configurations.

The Balance task class accepts swing_up and sparse flags. When swing_up=True, the pole starts pointing downward; when swing_up=False, it starts near vertical. The smooth reward combines pole uprightness, cart centering, small control, and small angular velocity. The sparse reward uses strict tolerance bounds for cart position and pole angle cosine.

Usage

Use this implementation for classic cart-pole balancing and swing-up benchmarks. Load via suite.load(domain_name='cartpole', task_name='balance') or any of the other registered task names.

Code Reference

Source Location

Repository: Google_deepmind_Dm_control
File: dm_control/suite/cartpole.py
Lines: 1-225

Signature

# Task factory functions
def balance(time_limit=10, random=None, environment_kwargs=None)
def balance_sparse(time_limit=10, random=None, environment_kwargs=None)
def swingup(time_limit=10, random=None, environment_kwargs=None)
def swingup_sparse(time_limit=10, random=None, environment_kwargs=None)
def two_poles(time_limit=10, random=None, environment_kwargs=None)
def three_poles(time_limit=10, random=None, num_poles=3, sparse=False,
                environment_kwargs=None)

# Model generation
def get_model_and_assets(num_poles=1)
def _make_model(n_poles)

# Physics subclass
class Physics(mujoco.Physics):
    def cart_position(self)       # position of the cart on the slider
    def angular_vel(self)         # angular velocity of all poles
    def pole_angle_cosine(self)   # cosine of each pole angle
    def bounded_position(self)    # cart pos + sin/cos of pole angles

# Task class
class Balance(base.Task):
    def __init__(self, swing_up, sparse, random=None)
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='cartpole', task_name='balance')

I/O Contract

Inputs

Name	Type	Required	Description
`time_limit`	float	No	Maximum episode duration in seconds (default 10).
`random`	int, numpy.random.RandomState, or None	No	Random seed or RNG instance for reproducibility.
`environment_kwargs`	dict or None	No	Additional keyword arguments forwarded to the `Environment` constructor.
`num_poles`	int	No	Number of poles (only for `three_poles` task, default 3).
`sparse`	bool	No	Whether to use sparse reward (only for `three_poles` task, default False).

Outputs

Name	Type	Description
environment	`dm_control.rl.control.Environment`	A fully initialised environment conforming to the `dm_env.Environment` interface.

Observations

Key	Type	Description
`position`	numpy array	Cart position and sin/cos of pole angles (bounded representation).
`velocity`	numpy array	Joint velocities.

Usage Examples

from dm_control import suite

# Load the balance task
env = suite.load(domain_name='cartpole', task_name='balance')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)

# Load the swing-up task with sparse reward
env_sparse = suite.load(domain_name='cartpole', task_name='swingup_sparse')

# Load the three-poles variant
env_three = suite.load(domain_name='cartpole', task_name='three_poles')

Related Pages

Principle:Google_deepmind_Dm_control_Control_Suite_Environment_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment