Principle:Google deepmind Dm control Props and Targets
| Metadata | |
|---|---|
| Knowledge Sources | dm_control |
| Domains | Reinforcement Learning, Robotics, Object Interaction |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Props and targets is the principle of placing interactive objects in a locomotion environment that serve as navigation goals, collectible rewards, or physical obstacles the agent must interact with.
Description
In locomotion tasks, the agent's objective often involves interacting with objects placed in the environment. These objects -- called props -- include target spheres that activate on contact, multi-touch targets that require repeated visits, and general physical objects that can be placed with collision-aware initialization.
The props-and-targets principle separates three concerns:
- Object definition: What the prop looks like, how it detects activation (contact-based), and how it resets between episodes. A target sphere, for example, is a non-physical (gap-enabled) sphere with a checker pattern that becomes invisible once touched.
- Object placement: Where props appear in the arena. Placement may be fixed, randomized within bounds, or determined by the maze structure (at grid positions marked as target locations). The PropPlacer initializer handles rejection sampling to find non-colliding poses.
- Task integration: How the task reward function queries prop activation state. Tasks check whether targets have been activated each step and provide corresponding reward signals.
This separation enables flexible composition: the same target sphere class works in floor arenas, corridors, and mazes, and the same placement logic works with any prop type.
Usage
Apply this principle when:
- Creating navigation tasks where the agent must reach one or more goal positions.
- Designing foraging tasks where multiple targets must be collected.
- Placing physical objects that the agent must manipulate or avoid.
- Initializing object positions with collision-aware rejection sampling.
- Building two-touch targets that require the agent to visit a location, leave, and return.
Theoretical Basis
Target activation follows a contact-detection pattern:
Target Activation Logic:
for each physics substep:
for each contact in physics.data.contact:
if target_geom_id in (contact.geom1, contact.geom2):
if specific_collision_filter is None or filter matches:
target.activated = True
make target invisible (alpha = 0)
Prop placement uses rejection sampling to avoid interpenetration:
PropPlacer Algorithm:
for each prop in props:
restore contact parameters for this prop
for attempt in range(max_attempts):
position = sample from position distribution
quaternion = sample from orientation distribution
prop.set_pose(physics, position, quaternion)
physics.forward()
if no collisions detected:
accept pose; break
else:
raise EpisodeInitializationError
optionally settle physics (step simulation until velocities converge)
The two-touch target variant introduces temporal structure:
TwoTouch Activation:
State: (touched_once: bool, touched_twice: bool)
On first contact: touched_once = True, record time
On later contact (after debounce period): touched_twice = True
activated = (touched_once, touched_twice)