Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Google deepmind Dm control Suite Stacker

From Leeroopedia
Knowledge Sources
Domains Reinforcement Learning, Robotics, Physics Simulation
Last Updated 2026-02-15 04:00 GMT

Overview

The Planar Stacker domain implements a robotic arm task where the agent must stack boxes on top of each other at a target location, registered as two hard-difficulty tasks in the dm_control suite.

Description

The Stacker module defines a planar manipulation environment in which a multi-jointed arm with a gripper must pick up and stack boxes at a designated target position. The domain provides two task variants: stack_2 (2 boxes) and stack_4 (4 boxes), both tagged as hard difficulty in the suite's task registry.

The Physics class extends mujoco.Physics with methods for computing bounded joint positions as (sin, cos) pairs, joint velocities, 2D body poses (position and optional orientation), logarithmic touch sensor readings, and site-to-site distances. The Stack task class handles episode initialization by randomizing arm joint angles within limits, symmetrizing hand configuration, randomizing target height (proportional to box count) and x-position, and randomizing box positions and orientations. A collision-free check ensures no interpenetration at episode start.

The reward function is the product of two tolerance-based components: the closest box-to-target distance (encouraging box placement) and a hand-is-far-from-target measure (encouraging the arm to release the box after placement). The make_model function dynamically modifies the base stacker.xml to remove unused box bodies beyond the required count.

Usage

Use this module when you need a hard manipulation benchmark requiring sequential planning and precise object stacking. Load tasks via the suite loader or instantiate directly with stack_2() or stack_4().

Code Reference

Source Location

Signature

def make_model(n_boxes):
    """Returns a tuple containing the model XML string and a dict of assets."""

def stack_2(fully_observable=True, time_limit=_TIME_LIMIT, random=None,
            environment_kwargs=None):
    """Returns stacker task with 2 boxes."""

def stack_4(fully_observable=True, time_limit=_TIME_LIMIT, random=None,
            environment_kwargs=None):
    """Returns stacker task with 4 boxes."""

class Physics(mujoco.Physics):
    def bounded_joint_pos(self, joint_names): ...
    def joint_vel(self, joint_names): ...
    def body_2d_pose(self, body_names, orientation=True): ...
    def touch(self): ...
    def site_distance(self, site1, site2): ...

class Stack(base.Task):
    def __init__(self, n_boxes, fully_observable, random=None): ...
    def initialize_episode(self, physics): ...
    def get_observation(self, physics): ...
    def get_reward(self, physics): ...

Import

from dm_control.suite import stacker

I/O Contract

Inputs

Name Type Required Description
n_boxes int Yes Number of boxes to stack (used in make_model and Stack task)
fully_observable bool No Whether observations include box positions, velocities, and target location (default True)
time_limit float No Episode time limit in seconds (default 10)
random int/np.random.RandomState/None No Random seed or state for reproducibility
environment_kwargs dict No Additional keyword arguments passed to the Environment constructor

Outputs

Name Type Description
environment control.Environment A dm_control Environment instance with the stacking task
observation OrderedDict Contains arm_pos (sin/cos), arm_vel, touch, and optionally hand_pos, box_pos, box_vel, target_pos
reward float Product of box-is-close and hand-is-far tolerances, range [0, 1]

Usage Examples

from dm_control import suite

# Load the 2-box stacking task
env = suite.load('stacker', 'stack_2')

# Or load the 4-box variant
env = suite.load('stacker', 'stack_4')

# Step through the environment
time_step = env.reset()
while not time_step.last():
    action = env.action_spec().generate_value()
    time_step = env.step(action)
    print("Reward:", time_step.reward)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment