Principle:Google deepmind Dm control Task Definition

Attribute	Value
Principle	Task Definition
Workflow	Composer_Environment_Building
Domain	Reinforcement_Learning, Composition
Source	dm_control
Last Updated	2026-02-15 00:00 GMT

Overview

A task is the logical specification of what constitutes success in an environment, defining the reward function, termination conditions, and the mapping from agent actions to actuator controls.

Description

In reinforcement learning, the environment is often conflated with the task, but they serve different purposes. The environment provides the physics simulation and the observation/action interface, while the task defines:

What to optimize: The reward signal that the agent seeks to maximize.
When to stop: The conditions under which an episode ends (success, failure, or timeout).
What to observe: Which observables from which entities are enabled for the agent.
How to act: How the raw action vector maps to actuator control signals.
How to reset: What the initial state of the world should be at the start of each episode.

The Task Definition principle separates these concerns into a dedicated abstract class that references a root entity (typically an arena containing one or more robots and props) but does not own the simulation loop or the observation buffering system. This separation allows:

The same set of entities to be reused across different tasks (e.g., the same robot in a reaching task vs. a locomotion task).
The environment class to handle the mechanics of stepping, resetting, and observation management uniformly across all tasks.
Task-specific logic (reward shaping, curriculum learning, termination conditions) to be cleanly isolated and tested.

The task also manages the relationship between the control timestep (the interval at which the agent acts) and the physics timestep (the interval at which MuJoCo advances the simulation). The control timestep must be an integer multiple of the physics timestep.

Usage

Use the Task Definition principle whenever you need to specify the objective and structure of a reinforcement learning problem built on top of Composer entities:

Define a reward: Implement get_reward(physics) to compute a scalar reward from the current physics state.
Define termination: Override should_terminate_episode(physics) to signal early termination on success or failure.
Initialize episodes: Override initialize_episode(physics, random_state) to randomize initial poses, or initialize_episode_mjcf(random_state) to modify the model structure between episodes.
Select observations: In the task constructor or initialize_episode_mjcf, enable or disable entity observables and optionally add task-level observables via the task_observables property.
Set timesteps: Call set_timesteps(control_timestep, physics_timestep) to configure the simulation timing.

Theoretical Basis

A Composer task maps directly to the standard reinforcement learning formalization of a Markov Decision Process (MDP):

MDP = (S, A, T, R, gamma)

S  = physics state (positions, velocities, contacts)
A  = action_spec(physics)         -- from Task.action_spec
T  = MuJoCo physics stepping      -- handled by Environment
R  = Task.get_reward(physics)      -- per-step reward
gamma = Task.get_discount(physics) -- per-step discount (default 1.0)

Episode termination:
  Task.should_terminate_episode(physics) OR time >= time_limit

The task's lifecycle within an episode is:

Task.initialize_episode_mjcf(random_state)
  -> recompile physics if model changed
Task.after_compile(physics, random_state)
Task.initialize_episode(physics, random_state)
for each control step:
    Task.before_step(physics, action, random_state)
      -> default: physics.set_control(action)
    for each physics substep:
        Task.before_substep(physics, action, random_state)
        physics.step()
        Task.after_substep(physics, random_state)
    Task.after_step(physics, random_state)
    reward = Task.get_reward(physics)
    done = Task.should_terminate_episode(physics)

The observables property of the task is the union of all entity observables (collected by walking the entity tree) and any task-specific observables. Only observables with enabled=True are included in the agent's observation.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment