Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Google deepmind Dm control Locomotion Task Definition

From Leeroopedia
Metadata
Knowledge Sources dm_control
Domains Reinforcement Learning, Robotics, Locomotion
Last Updated 2026-02-15 00:00 GMT

Overview

Locomotion task definition is the principle of specifying the objective, reward structure, termination conditions, and episode initialization logic that together define what a locomotion agent must accomplish.

Description

A locomotion task binds a walker to an arena and overlays a behavioral objective. It specifies how the walker is spawned each episode, which observables are enabled, how rewards are computed from the physics state, when episodes terminate (either due to success, failure, or time limits), and what discount signal the agent receives. The task does not define the body or the terrain -- it defines what the agent should do in that body within that terrain.

Task definition follows the Markov Decision Process (MDP) formalism: at each control timestep, the task provides an observation, accepts an action, computes a scalar reward, and signals whether the episode continues. Different tasks over the same walker and arena create fundamentally different learning problems -- running forward at a target velocity versus reaching scattered goals versus escaping a bowl-shaped terrain.

Usage

Apply this principle when:

  • Defining the reward function for a locomotion behavior (velocity tracking, goal reaching, terrain escape, target collection).
  • Setting episode termination conditions (contact violations, height thresholds, all targets collected).
  • Choosing which walker observables to enable for a particular experiment.
  • Configuring physics and control timesteps for the simulation-action loop.
  • Composing walkers and arenas into concrete learning problems.

Theoretical Basis

A locomotion task implements the following MDP interface:

Task Interface (composer.Task):
  Properties:
    root_entity -> the arena (top-level MJCF entity)

  Episode Lifecycle:
    initialize_episode_mjcf(random_state)  -> modify MJCF before compilation
    initialize_episode(physics, random_state) -> set initial state
    before_step(physics, action, random_state) -> apply action to walker
    after_step(physics, random_state)          -> check contacts, update state

  MDP Signals:
    get_reward(physics)                -> R(s) : float
    get_discount(physics)              -> gamma(s) : float in [0, 1]
    should_terminate_episode(physics)  -> bool

Common reward patterns in locomotion tasks include:

Velocity Tracking:
  reward = tolerance(walker_xvel, target=(v_target, v_target), margin=v_target)

Goal Reaching:
  reward = 1.0  if distance(walker, target) < tolerance  else 0.0

Escape:
  reward = upright_reward * tolerance(distance_from_center, bounds=(terrain_size, inf))

Target Collection (Maze):
  reward = target_reward_scale  if target.activated  else 0.0

Termination conditions typically combine contact-based failure (non-foot geoms touching ground) with height-based failure (end effectors below a threshold), plus optional task-specific success conditions (all targets collected, corridor end reached).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment