Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Google deepmind Dm control Suite Cheetah

From Leeroopedia
Metadata Value
Implementation Suite Cheetah
Domain Reinforcement_Learning, Control
Source Google_deepmind_Dm_control
Last Updated 2026-02-15 04:00 GMT

Overview

Concrete tool for training a planar cheetah to run at high speed provided by the dm_control Control Suite.

Description

The Cheetah domain models a planar running creature (inspired by the half-cheetah benchmark) that must learn to run forward as fast as possible. The Physics subclass exposes a single helper method, speed, which reads the horizontal component of the torso's subtree linear velocity from a MuJoCo sensor.

A single benchmarking task, run, is registered. The Cheetah task class initializes each episode by randomizing all limited joint positions within their allowed ranges, then stabilizes the model by running 200 physics steps before resetting the simulation time to zero. Observations include all joint positions except the horizontal root position (to maintain translational invariance) and all joint velocities.

The reward uses a linear sigmoid tolerance function that returns 1 when the horizontal speed reaches or exceeds 10 m/s, with a margin of 10 m/s. The default time limit is 10 seconds.

Usage

Use this implementation for the standard locomotion running benchmark. Load it via suite.load(domain_name='cheetah', task_name='run').

Code Reference

Source Location

Signature

# Task factory function
def run(time_limit=10, random=None, environment_kwargs=None)

# Physics subclass
class Physics(mujoco.Physics):
    def speed(self)   # horizontal speed of the cheetah torso

# Task class
class Cheetah(base.Task):
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='cheetah', task_name='run')

I/O Contract

Inputs

Name Type Required Description
time_limit float No Maximum episode duration in seconds (default 10).
random int, numpy.random.RandomState, or None No Random seed or RNG instance for reproducibility.
environment_kwargs dict or None No Additional keyword arguments forwarded to the Environment constructor.

Outputs

Name Type Description
environment dm_control.rl.control.Environment A fully initialised environment conforming to the dm_env.Environment interface.

Observations

Key Type Description
position numpy array Joint positions excluding horizontal root position (translational invariance).
velocity numpy array Joint velocities.

Usage Examples

from dm_control import suite

# Load the cheetah run task
env = suite.load(domain_name='cheetah', task_name='run')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)
    print(f"Reward: {time_step.reward:.3f}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment