Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Google deepmind Dm control Suite Manipulator

From Leeroopedia
Metadata Value
Implementation Suite Manipulator
Domain Reinforcement_Learning, Control
Source Google_deepmind_Dm_control
Last Updated 2026-02-15 04:00 GMT

Overview

Concrete tool for controlling a planar robotic arm to bring or insert objects provided by the dm_control Control Suite.

Description

The Manipulator domain models a planar robotic arm with 8 joints (root, shoulder, elbow, wrist, finger, fingertip, thumb, and thumbtip) that must grasp and manipulate objects. The domain supports two prop types (ball and peg) and two task modes (bring and insert). The make_model function dynamically generates the MJCF XML by selectively removing unused props and receptacles from the base model. For bring tasks, only the prop and its target are kept; for insert tasks, the corresponding receptacle (cup for ball, slot for peg) is also included.

The Physics subclass provides methods for reading bounded joint positions (as sin/cos pairs), joint velocities, 2D body poses (position and optional orientation), logarithmically scaled touch sensor signals from five contact sensors, and the Euclidean distance between named sites. The Bring task class handles all four task variants, parameterized by use_peg, insert, and fully_observable flags.

Four tasks are registered: bring_ball (benchmarking), bring_peg, insert_ball, and insert_peg (all tagged as hard). Episode initialization randomizes arm joint angles, target location, and object location. The object starts in the hand (10% probability), at the target (10% probability), or at a random location (80% probability). The peg reward combines grasping and bringing sub-rewards; the ball reward measures proximity to the target. All tasks use a control timestep of 0.01 seconds and a time limit of 10 seconds.

Usage

Use this implementation for challenging manipulation benchmarks involving grasping and placement. Load via suite.load(domain_name='manipulator', task_name='bring_ball') or any of the other registered task names.

Code Reference

Source Location

Signature

# Task factory functions
def bring_ball(fully_observable=True, time_limit=10, random=None,
               environment_kwargs=None)
def bring_peg(fully_observable=True, time_limit=10, random=None,
              environment_kwargs=None)
def insert_ball(fully_observable=True, time_limit=10, random=None,
                environment_kwargs=None)
def insert_peg(fully_observable=True, time_limit=10, random=None,
               environment_kwargs=None)

# Model generation
def make_model(use_peg, insert)

# Physics subclass
class Physics(mujoco.Physics):
    def bounded_joint_pos(self, joint_names)  # joint positions as (sin, cos)
    def joint_vel(self, joint_names)          # joint velocities
    def body_2d_pose(self, body_names, orientation=True)  # 2D pose
    def touch(self)                            # log-scaled touch sensors
    def site_distance(self, site1, site2)      # Euclidean distance between sites

# Task class
class Bring(base.Task):
    def __init__(self, use_peg, insert, fully_observable, random=None)
    def initialize_episode(self, physics)
    def get_observation(self, physics)
    def get_reward(self, physics)

Import

from dm_control import suite

env = suite.load(domain_name='manipulator', task_name='bring_ball')

I/O Contract

Inputs

Name Type Required Description
fully_observable bool No Whether observations include object and target state (default True).
time_limit float No Maximum episode duration in seconds (default 10).
random int, numpy.random.RandomState, or None No Random seed or RNG instance for reproducibility.
environment_kwargs dict or None No Additional keyword arguments forwarded to the Environment constructor.

Outputs

Name Type Description
environment dm_control.rl.control.Environment A fully initialised environment conforming to the dm_env.Environment interface.

Observations

Key Type Description
arm_pos numpy array (8, 2) Arm joint positions as (sin, cos) pairs.
arm_vel numpy array (8,) Arm joint velocities.
touch numpy array (5,) Log-scaled signals from palm, finger, thumb, fingertip, and thumbtip sensors.
hand_pos numpy array (4,) Hand 2D pose with orientation (fully observable mode only).
object_pos numpy array (4,) Object 2D pose with orientation (fully observable mode only).
object_vel numpy array (3,) Object joint velocities (fully observable mode only).
target_pos numpy array (4,) Target 2D pose with orientation (fully observable mode only).

Usage Examples

from dm_control import suite

# Load the bring ball task
env = suite.load(domain_name='manipulator', task_name='bring_ball')

# Run an episode
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)

# Load the insert peg task (hard)
env_insert = suite.load(domain_name='manipulator', task_name='insert_peg')

# Load with sensor-only observations (no object/target state)
env_partial = suite.load(
    domain_name='manipulator',
    task_name='bring_ball',
    task_kwargs={'fully_observable': False}
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment