Implementation:Google deepmind Dm control Suite Reacher

Metadata	Value
Implementation	Suite Reacher
Domain	Reinforcement_Learning, Control
Source	Google_deepmind_Dm_control
Last Updated	2026-02-15 04:00 GMT

Overview

Concrete tool for controlling a two-link planar arm to reach a randomized target provided by the dm_control Control Suite.

Description

The Reacher domain models a two-link planar arm that must reach a target position. The target is placed at a random angle and random radius (between 0.05 and 0.20) from the origin at the start of each episode. The Physics subclass provides methods for computing the 2D vector from the finger tip to the target (finger_to_target) and the scalar distance between them (finger_to_target_dist).

Two tasks are registered: easy (benchmarking, tagged easy) and hard (benchmarking). Both use the Reacher task class parameterized by target_size. The easy variant uses a target radius of 0.05 m while the hard variant uses 0.015 m, requiring more precise positioning. Episode initialization randomizes the arm's limited and rotational joints and places the target at a new random location.

The reward is computed using a tolerance function that returns 1 when the finger-to-target distance is within the combined radii of the target and finger geometries. Observations include the joint positions, the vector to the target, and joint velocities. The default time limit is 20 seconds.

Usage

Use this implementation for a standard reaching benchmark. Load via suite.load(domain_name='reacher', task_name='easy') or suite.load(domain_name='reacher', task_name='hard').

Code Reference

Source Location

Repository: Google_deepmind_Dm_control
File: dm_control/suite/reacher.py
Lines: 1-112

Signature

# Task factory functions
def easy(time_limit=20, random=None, environment_kwargs=None)
def hard(time_limit=20, random=None, environment_kwargs=None)

# Physics subclass
class Physics(mujoco.Physics):
    def finger_to_target(self)        # 2D vector from finger to target
    def finger_to_target_dist(self)   # scalar distance to target

# Task class
class Reacher(base.Task):
    def __init__(self, target_size, random=None)
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='reacher', task_name='easy')

I/O Contract

Inputs

Name	Type	Required	Description
`time_limit`	float	No	Maximum episode duration in seconds (default 20).
`random`	int, numpy.random.RandomState, or None	No	Random seed or RNG instance for reproducibility.
`environment_kwargs`	dict or None	No	Additional keyword arguments forwarded to the `Environment` constructor.

Outputs

Name	Type	Description
environment	`dm_control.rl.control.Environment`	A fully initialised environment conforming to the `dm_env.Environment` interface.

Observations

Key	Type	Description
`position`	numpy array	Joint positions of the two-link arm.
`to_target`	numpy array (2,)	2D vector from the finger to the target.
`velocity`	numpy array	Joint velocities.

Usage Examples

from dm_control import suite

# Load the easy reacher task
env = suite.load(domain_name='reacher', task_name='easy')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)

# Load the hard variant with smaller target
env_hard = suite.load(domain_name='reacher', task_name='hard')

Related Pages

Principle:Google_deepmind_Dm_control_Control_Suite_Environment_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment