Implementation:Google deepmind Dm control Suite Stacker

Knowledge Sources	Google_deepmind_Dm_control
Domains	Reinforcement Learning, Robotics, Physics Simulation
Last Updated	2026-02-15 04:00 GMT

Overview

The Planar Stacker domain implements a robotic arm task where the agent must stack boxes on top of each other at a target location, registered as two hard-difficulty tasks in the dm_control suite.

Description

The Stacker module defines a planar manipulation environment in which a multi-jointed arm with a gripper must pick up and stack boxes at a designated target position. The domain provides two task variants: stack_2 (2 boxes) and stack_4 (4 boxes), both tagged as hard difficulty in the suite's task registry.

The Physics class extends mujoco.Physics with methods for computing bounded joint positions as (sin, cos) pairs, joint velocities, 2D body poses (position and optional orientation), logarithmic touch sensor readings, and site-to-site distances. The Stack task class handles episode initialization by randomizing arm joint angles within limits, symmetrizing hand configuration, randomizing target height (proportional to box count) and x-position, and randomizing box positions and orientations. A collision-free check ensures no interpenetration at episode start.

The reward function is the product of two tolerance-based components: the closest box-to-target distance (encouraging box placement) and a hand-is-far-from-target measure (encouraging the arm to release the box after placement). The make_model function dynamically modifies the base stacker.xml to remove unused box bodies beyond the required count.

Usage

Use this module when you need a hard manipulation benchmark requiring sequential planning and precise object stacking. Load tasks via the suite loader or instantiate directly with stack_2() or stack_4().

Code Reference

Source Location

Repository: Google_deepmind_Dm_control
File: dm_control/suite/stacker.py
Lines: 1-204

Signature

def make_model(n_boxes):
    """Returns a tuple containing the model XML string and a dict of assets."""

def stack_2(fully_observable=True, time_limit=_TIME_LIMIT, random=None,
            environment_kwargs=None):
    """Returns stacker task with 2 boxes."""

def stack_4(fully_observable=True, time_limit=_TIME_LIMIT, random=None,
            environment_kwargs=None):
    """Returns stacker task with 4 boxes."""

class Physics(mujoco.Physics):
    def bounded_joint_pos(self, joint_names): ...
    def joint_vel(self, joint_names): ...
    def body_2d_pose(self, body_names, orientation=True): ...
    def touch(self): ...
    def site_distance(self, site1, site2): ...

class Stack(base.Task):
    def __init__(self, n_boxes, fully_observable, random=None): ...
    def initialize_episode(self, physics): ...
    def get_observation(self, physics): ...
    def get_reward(self, physics): ...

Import

from dm_control.suite import stacker

I/O Contract

Inputs

Name	Type	Required	Description
n_boxes	int	Yes	Number of boxes to stack (used in make_model and Stack task)
fully_observable	bool	No	Whether observations include box positions, velocities, and target location (default True)
time_limit	float	No	Episode time limit in seconds (default 10)
random	int/np.random.RandomState/None	No	Random seed or state for reproducibility
environment_kwargs	dict	No	Additional keyword arguments passed to the Environment constructor

Outputs

Name	Type	Description
environment	control.Environment	A dm_control Environment instance with the stacking task
observation	OrderedDict	Contains arm_pos (sin/cos), arm_vel, touch, and optionally hand_pos, box_pos, box_vel, target_pos
reward	float	Product of box-is-close and hand-is-far tolerances, range [0, 1]

Usage Examples

from dm_control import suite

# Load the 2-box stacking task
env = suite.load('stacker', 'stack_2')

# Or load the 4-box variant
env = suite.load('stacker', 'stack_4')

# Step through the environment
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)
    print("Reward:", time_step.reward)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment