Implementation:Google deepmind Dm control Suite Cheetah
| Metadata | Value |
|---|---|
| Implementation | Suite Cheetah |
| Domain | Reinforcement_Learning, Control |
| Source | Google_deepmind_Dm_control |
| Last Updated | 2026-02-15 04:00 GMT |
Overview
Concrete tool for training a planar cheetah to run at high speed provided by the dm_control Control Suite.
Description
The Cheetah domain models a planar running creature (inspired by the half-cheetah benchmark) that must learn to run forward as fast as possible. The Physics subclass exposes a single helper method, speed, which reads the horizontal component of the torso's subtree linear velocity from a MuJoCo sensor.
A single benchmarking task, run, is registered. The Cheetah task class initializes each episode by randomizing all limited joint positions within their allowed ranges, then stabilizes the model by running 200 physics steps before resetting the simulation time to zero. Observations include all joint positions except the horizontal root position (to maintain translational invariance) and all joint velocities.
The reward uses a linear sigmoid tolerance function that returns 1 when the horizontal speed reaches or exceeds 10 m/s, with a margin of 10 m/s. The default time limit is 10 seconds.
Usage
Use this implementation for the standard locomotion running benchmark. Load it via suite.load(domain_name='cheetah', task_name='run').
Code Reference
Source Location
- Repository: Google_deepmind_Dm_control
- File: dm_control/suite/cheetah.py
- Lines: 1-92
Signature
# Task factory function
def run(time_limit=10, random=None, environment_kwargs=None)
# Physics subclass
class Physics(mujoco.Physics):
def speed(self) # horizontal speed of the cheetah torso
# Task class
class Cheetah(base.Task):
def initialize_episode(self, physics)
def get_observation(self, physics)
def get_reward(self, physics)
Import
from dm_control import suite
env = suite.load(domain_name='cheetah', task_name='run')
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
time_limit |
float | No | Maximum episode duration in seconds (default 10). |
random |
int, numpy.random.RandomState, or None | No | Random seed or RNG instance for reproducibility. |
environment_kwargs |
dict or None | No | Additional keyword arguments forwarded to the Environment constructor.
|
Outputs
| Name | Type | Description |
|---|---|---|
| environment | dm_control.rl.control.Environment |
A fully initialised environment conforming to the dm_env.Environment interface.
|
Observations
| Key | Type | Description |
|---|---|---|
position |
numpy array | Joint positions excluding horizontal root position (translational invariance). |
velocity |
numpy array | Joint velocities. |
Usage Examples
from dm_control import suite
# Load the cheetah run task
env = suite.load(domain_name='cheetah', task_name='run')
# Run an episode
time_step = env.reset()
while not time_step.last():
action = env.action_spec().generate_value()
time_step = env.step(action)
print(f"Reward: {time_step.reward:.3f}")