Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Google deepmind Dm control Suite Ball In Cup

From Leeroopedia
Metadata Value
Implementation Suite Ball In Cup
Domain Reinforcement_Learning, Control
Source Google_deepmind_Dm_control
Last Updated 2026-02-15 04:00 GMT

Overview

Concrete tool for the ball-in-cup catching task provided by the dm_control Control Suite.

Description

The Ball-in-Cup domain simulates a planar cup attached to an actuated arm with a ball connected to the cup by a string. The objective is to swing the ball upward so that it lands inside the cup. The domain defines a custom Physics subclass that provides two helper methods: ball_to_target, which computes the 2D vector from the ball to the target location (the cup), and in_target, which returns 1.0 if the ball is geometrically inside the target and 0.0 otherwise.

A single benchmark task, catch, is registered with the tags 'benchmarking' and 'easy'. The BallInCup task class initializes each episode by placing the ball at a random collision-free position (x in [-0.2, 0.2], z in [0.2, 0.5]) and returns observations containing joint positions and velocities. The reward is sparse: it returns 1 when the ball is fully inside the target cup and 0 otherwise.

The environment uses a default time limit of 20 seconds and a control timestep of 0.02 seconds.

Usage

Use this implementation for a classic sparse-reward manipulation benchmark. Load it via suite.load(domain_name='ball_in_cup', task_name='catch').

Code Reference

Source Location

Signature

# Task factory function
def catch(time_limit=20, random=None, environment_kwargs=None)

# Physics subclass
class Physics(mujoco.Physics):
    def ball_to_target(self)   # vector from ball to target (x, z)
    def in_target(self)        # 1 if ball is inside target, 0 otherwise

# Task class
class BallInCup(base.Task):
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='ball_in_cup', task_name='catch')

I/O Contract

Inputs

Name Type Required Description
time_limit float No Maximum episode duration in seconds (default 20).
random int, numpy.random.RandomState, or None No Random seed or RNG instance for reproducibility.
environment_kwargs dict or None No Additional keyword arguments forwarded to the Environment constructor.

Outputs

Name Type Description
environment dm_control.rl.control.Environment A fully initialised environment conforming to the dm_env.Environment interface.

Observations

Key Type Description
position numpy array Joint positions of the cup mechanism.
velocity numpy array Joint velocities of the cup mechanism.

Usage Examples

from dm_control import suite

# Load the ball-in-cup catch task
env = suite.load(domain_name='ball_in_cup', task_name='catch')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)
    print(f"Reward: {time_step.reward}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment