Implementation:Google deepmind Dm control Suite Cheetah

Metadata	Value
Implementation	Suite Cheetah
Domain	Reinforcement_Learning, Control
Source	Google_deepmind_Dm_control
Last Updated	2026-02-15 04:00 GMT

Overview

Concrete tool for training a planar cheetah to run at high speed provided by the dm_control Control Suite.

Description

The Cheetah domain models a planar running creature (inspired by the half-cheetah benchmark) that must learn to run forward as fast as possible. The Physics subclass exposes a single helper method, speed, which reads the horizontal component of the torso's subtree linear velocity from a MuJoCo sensor.

A single benchmarking task, run, is registered. The Cheetah task class initializes each episode by randomizing all limited joint positions within their allowed ranges, then stabilizes the model by running 200 physics steps before resetting the simulation time to zero. Observations include all joint positions except the horizontal root position (to maintain translational invariance) and all joint velocities.

The reward uses a linear sigmoid tolerance function that returns 1 when the horizontal speed reaches or exceeds 10 m/s, with a margin of 10 m/s. The default time limit is 10 seconds.

Usage

Use this implementation for the standard locomotion running benchmark. Load it via suite.load(domain_name='cheetah', task_name='run').

Code Reference

Source Location

Repository: Google_deepmind_Dm_control
File: dm_control/suite/cheetah.py
Lines: 1-92

Signature

# Task factory function
def run(time_limit=10, random=None, environment_kwargs=None)

# Physics subclass
class Physics(mujoco.Physics):
    def speed(self)   # horizontal speed of the cheetah torso

# Task class
class Cheetah(base.Task):
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='cheetah', task_name='run')

I/O Contract

Inputs

Name	Type	Required	Description
`time_limit`	float	No	Maximum episode duration in seconds (default 10).
`random`	int, numpy.random.RandomState, or None	No	Random seed or RNG instance for reproducibility.
`environment_kwargs`	dict or None	No	Additional keyword arguments forwarded to the `Environment` constructor.

Outputs

Name	Type	Description
environment	`dm_control.rl.control.Environment`	A fully initialised environment conforming to the `dm_env.Environment` interface.

Observations

Key	Type	Description
`position`	numpy array	Joint positions excluding horizontal root position (translational invariance).
`velocity`	numpy array	Joint velocities.

Usage Examples

from dm_control import suite

# Load the cheetah run task
env = suite.load(domain_name='cheetah', task_name='run')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)
    print(f"Reward: {time_step.reward:.3f}")

Related Pages

Principle:Google_deepmind_Dm_control_Control_Suite_Environment_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment