Implementation:Google deepmind Dm control Suite Hopper

Metadata	Value
Implementation	Suite Hopper
Domain	Reinforcement_Learning, Control
Source	Google_deepmind_Dm_control
Last Updated	2026-02-15 04:00 GMT

Overview

Concrete tool for training a planar one-legged hopper to stand and hop provided by the dm_control Control Suite.

Description

The Hopper domain models a planar one-legged robot with a torso, thigh, leg, and foot. The agent must learn to balance upright or hop forward. The Physics subclass exposes three helper methods: height, which returns the vertical distance from the torso to the foot; speed, which reads the horizontal component of the torso subtree linear velocity from a sensor; and touch, which returns logarithmically scaled signals from the toe and heel touch sensors.

Two benchmarking tasks are registered: stand and hop. Both use the Hopper task class, which initializes each episode by randomizing limited and rotational joints. The stand task rewards the agent for maintaining a torso height above 0.6 m while using small control signals (the standing reward is multiplied by a control penalty). The hop task rewards the agent for both standing tall and moving forward at a speed of at least 2 m/s, using a linear sigmoid tolerance for the speed component.

Both tasks use a default time limit of 20 seconds and a control timestep of 0.02 seconds. Observations include joint positions (excluding horizontal position for translational invariance), velocities, and touch sensor readings.

Usage

Use this implementation for planar balancing and hopping locomotion benchmarks. Load via suite.load(domain_name='hopper', task_name='stand') or suite.load(domain_name='hopper', task_name='hop').

Code Reference

Source Location

Repository: Google_deepmind_Dm_control
File: dm_control/suite/hopper.py
Lines: 1-134

Signature

# Task factory functions
def stand(time_limit=20, random=None, environment_kwargs=None)
def hop(time_limit=20, random=None, environment_kwargs=None)

# Physics subclass
class Physics(mujoco.Physics):
    def height(self)   # torso height relative to foot
    def speed(self)    # horizontal speed from subtree linvel sensor
    def touch(self)    # log-scaled toe and heel touch sensor signals

# Task class
class Hopper(base.Task):
    def __init__(self, hopping, random=None)
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='hopper', task_name='stand')

I/O Contract

Inputs

Name	Type	Required	Description
`time_limit`	float	No	Maximum episode duration in seconds (default 20).
`random`	int, numpy.random.RandomState, or None	No	Random seed or RNG instance for reproducibility.
`environment_kwargs`	dict or None	No	Additional keyword arguments forwarded to the `Environment` constructor.

Outputs

Name	Type	Description
environment	`dm_control.rl.control.Environment`	A fully initialised environment conforming to the `dm_env.Environment` interface.

Observations

Key	Type	Description
`position`	numpy array	Joint positions excluding horizontal root position.
`velocity`	numpy array	Joint velocities.
`touch`	numpy array (2,)	Log-scaled toe and heel touch sensor signals.

Usage Examples

from dm_control import suite

# Load the stand task
env = suite.load(domain_name='hopper', task_name='stand')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)

# Load the hop task
env_hop = suite.load(domain_name='hopper', task_name='hop')

Related Pages

Principle:Google_deepmind_Dm_control_Control_Suite_Environment_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment