Principle:Google deepmind Dm control Props and Targets

Metadata
Knowledge Sources	dm_control
Domains	Reinforcement Learning, Robotics, Object Interaction
Last Updated	2026-02-15 00:00 GMT

Overview

Props and targets is the principle of placing interactive objects in a locomotion environment that serve as navigation goals, collectible rewards, or physical obstacles the agent must interact with.

Description

In locomotion tasks, the agent's objective often involves interacting with objects placed in the environment. These objects -- called props -- include target spheres that activate on contact, multi-touch targets that require repeated visits, and general physical objects that can be placed with collision-aware initialization.

The props-and-targets principle separates three concerns:

Object definition: What the prop looks like, how it detects activation (contact-based), and how it resets between episodes. A target sphere, for example, is a non-physical (gap-enabled) sphere with a checker pattern that becomes invisible once touched.
Object placement: Where props appear in the arena. Placement may be fixed, randomized within bounds, or determined by the maze structure (at grid positions marked as target locations). The PropPlacer initializer handles rejection sampling to find non-colliding poses.
Task integration: How the task reward function queries prop activation state. Tasks check whether targets have been activated each step and provide corresponding reward signals.

This separation enables flexible composition: the same target sphere class works in floor arenas, corridors, and mazes, and the same placement logic works with any prop type.

Usage

Apply this principle when:

Creating navigation tasks where the agent must reach one or more goal positions.
Designing foraging tasks where multiple targets must be collected.
Placing physical objects that the agent must manipulate or avoid.
Initializing object positions with collision-aware rejection sampling.
Building two-touch targets that require the agent to visit a location, leave, and return.

Theoretical Basis

Target activation follows a contact-detection pattern:

Target Activation Logic:
  for each physics substep:
    for each contact in physics.data.contact:
      if target_geom_id in (contact.geom1, contact.geom2):
        if specific_collision_filter is None or filter matches:
          target.activated = True
          make target invisible (alpha = 0)

Prop placement uses rejection sampling to avoid interpenetration:

PropPlacer Algorithm:
  for each prop in props:
    restore contact parameters for this prop
    for attempt in range(max_attempts):
      position = sample from position distribution
      quaternion = sample from orientation distribution
      prop.set_pose(physics, position, quaternion)
      physics.forward()
      if no collisions detected:
        accept pose; break
    else:
      raise EpisodeInitializationError

  optionally settle physics (step simulation until velocities converge)

The two-touch target variant introduces temporal structure:

TwoTouch Activation:
  State: (touched_once: bool, touched_twice: bool)
  On first contact:  touched_once = True, record time
  On later contact (after debounce period): touched_twice = True
  activated = (touched_once, touched_twice)

Related Pages

Implementation:Google_deepmind_Dm_control_Locomotion_Props

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment