Principle:Google deepmind Dm control Task Definition
| Attribute | Value |
|---|---|
| Principle | Task Definition |
| Workflow | Composer_Environment_Building |
| Domain | Reinforcement_Learning, Composition |
| Source | dm_control |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
A task is the logical specification of what constitutes success in an environment, defining the reward function, termination conditions, and the mapping from agent actions to actuator controls.
Description
In reinforcement learning, the environment is often conflated with the task, but they serve different purposes. The environment provides the physics simulation and the observation/action interface, while the task defines:
- What to optimize: The reward signal that the agent seeks to maximize.
- When to stop: The conditions under which an episode ends (success, failure, or timeout).
- What to observe: Which observables from which entities are enabled for the agent.
- How to act: How the raw action vector maps to actuator control signals.
- How to reset: What the initial state of the world should be at the start of each episode.
The Task Definition principle separates these concerns into a dedicated abstract class that references a root entity (typically an arena containing one or more robots and props) but does not own the simulation loop or the observation buffering system. This separation allows:
- The same set of entities to be reused across different tasks (e.g., the same robot in a reaching task vs. a locomotion task).
- The environment class to handle the mechanics of stepping, resetting, and observation management uniformly across all tasks.
- Task-specific logic (reward shaping, curriculum learning, termination conditions) to be cleanly isolated and tested.
The task also manages the relationship between the control timestep (the interval at which the agent acts) and the physics timestep (the interval at which MuJoCo advances the simulation). The control timestep must be an integer multiple of the physics timestep.
Usage
Use the Task Definition principle whenever you need to specify the objective and structure of a reinforcement learning problem built on top of Composer entities:
- Define a reward: Implement
get_reward(physics)to compute a scalar reward from the current physics state. - Define termination: Override
should_terminate_episode(physics)to signal early termination on success or failure. - Initialize episodes: Override
initialize_episode(physics, random_state)to randomize initial poses, orinitialize_episode_mjcf(random_state)to modify the model structure between episodes. - Select observations: In the task constructor or
initialize_episode_mjcf, enable or disable entity observables and optionally add task-level observables via thetask_observablesproperty. - Set timesteps: Call
set_timesteps(control_timestep, physics_timestep)to configure the simulation timing.
Theoretical Basis
A Composer task maps directly to the standard reinforcement learning formalization of a Markov Decision Process (MDP):
MDP = (S, A, T, R, gamma)
S = physics state (positions, velocities, contacts)
A = action_spec(physics) -- from Task.action_spec
T = MuJoCo physics stepping -- handled by Environment
R = Task.get_reward(physics) -- per-step reward
gamma = Task.get_discount(physics) -- per-step discount (default 1.0)
Episode termination:
Task.should_terminate_episode(physics) OR time >= time_limit
The task's lifecycle within an episode is:
Task.initialize_episode_mjcf(random_state)
-> recompile physics if model changed
Task.after_compile(physics, random_state)
Task.initialize_episode(physics, random_state)
for each control step:
Task.before_step(physics, action, random_state)
-> default: physics.set_control(action)
for each physics substep:
Task.before_substep(physics, action, random_state)
physics.step()
Task.after_substep(physics, random_state)
Task.after_step(physics, random_state)
reward = Task.get_reward(physics)
done = Task.should_terminate_episode(physics)
The observables property of the task is the union of all entity observables (collected by walking the entity tree) and any task-specific observables. Only observables with enabled=True are included in the agent's observation.