Workflow:Google deepmind Dm control Locomotion Task Setup

Knowledge Sources	dm_control Locomotion README Emergence of Locomotion Behaviours Deep Neuroethology of a Virtual Rodent
Domains	Reinforcement_Learning, Locomotion, Physics_Simulation
Last Updated	2026-02-15 12:00 GMT

Overview

End-to-end process for constructing locomotion reinforcement learning environments by combining walker agents, terrain arenas, and locomotion-specific tasks using the dm_control Composer framework.

Description

This workflow covers the standard procedure for building locomotion environments using the dm_control locomotion library. The library provides a rich set of pre-built walkers (CMU humanoid, rodent, ant, fruitfly, jumping ball), terrain arenas (floors, corridors with gaps/walls, bowls, mazes), and locomotion tasks (corridor running, target reaching, escape, maze foraging, motion tracking). The process involves selecting a walker with appropriate actuator configuration, building a terrain arena, defining the locomotion task with reward and termination logic, and assembling the Composer environment. This workflow produces environments suitable for training locomotion policies via RL.

Usage

Execute this workflow when you need to create a locomotion-focused RL environment with a specific walker morphology, terrain type, and locomotion objective. Use this for research on locomotion control, motor skill learning, or benchmarking locomotion algorithms across different body plans and terrains.

Execution Steps

Step 1: Select and Configure a Walker

Choose a walker entity that matches the desired body morphology and control interface. Available walkers include the CMU humanoid (56-joint full-body humanoid with position or velocity control), rodent (65-joint biomechanical rat), ant (4-legged quadruped), fruitfly (detailed Drosophila model with wings and adhesion), and jumping ball (simplified ball agent). Configure observable options such as egocentric cameras and proprioceptive sensors.

Key considerations:

CMUHumanoidPositionControlled uses scaled position actuators for stable control
CMUHumanoidPositionControlledV2020 is the updated version with improved physics
Rat walker has 65 mocap joints and egocentric camera support
Ant walker is lightweight and suitable for fast prototyping
JumpingBallWithHead provides a minimal walker for testing environments
Walkers define their own observables (joint positions, velocities, IMU, cameras)

Step 2: Build a Terrain Arena

Select and configure the terrain arena that defines the physical world. Options include flat floors for open navigation, corridors with walls or gaps for obstacle traversal, bowl-shaped terrains for escape tasks, and procedurally generated mazes for navigation. Arenas support aesthetic themes (indoor, outdoor_natural) and can include textures from the LabMaze texture library.

Key considerations:

Floor is the simplest arena with a flat ground plane and top-down camera
GapsCorridor creates platforms with randomized gap widths for jumping tasks
WallsCorridor places wall obstacles with randomized widths along a corridor
Bowl creates a concave terrain surface the agent must climb out of
RandomMazeWithTargets generates procedural mazes with configurable room sizes and target positions
Corridor dimensions (length, width) and randomization ranges are configurable

Step 3: Define the Locomotion Task

Select and configure the locomotion task that defines the reward function, termination conditions, and episode initialization. Available tasks include RunThroughCorridor (velocity-tracking reward for corridor traversal), GoToTarget (distance-based reward for target reaching), Escape (maximize distance from origin), ManyGoalsMaze (collect targets in a maze), and TwoTouch (visuomotor reaching with timing constraints).

Key considerations:

RunThroughCorridor rewards the walker for maintaining a target forward velocity
GoToTarget rewards proximity to a randomly placed target sphere
Escape rewards the walker for maximizing displacement from the starting position
ManyGoalsMaze rewards collecting multiple target spheres scattered in maze rooms
ManyHeterogeneousGoalsMaze adds positive and negative targets requiring discrimination
Tasks specify physics_timestep and control_timestep independently

Step 4: Configure Props and Targets

Optionally add props and target objects to the environment. Target spheres serve as goal indicators with configurable size, color, and activation behavior. The TargetSphere activates on single contact; TargetSphereTwoTouch requires two touches with a specific time interval. Props can be initialized with collision-aware placement using the PropPlacer initializer.

Key considerations:

TargetSphere provides visual goal indicators that detect walker contact
Target builders are passed as functools.partial for deferred construction
target_reward_scale controls the magnitude of rewards for reaching targets
Multiple target types with different reward values enable heterogeneous foraging
Props are attached to the arena as free entities with freejoints

Step 5: Assemble the Composer Environment

Instantiate the composer.Environment with the configured task, time limit, random state, and observation options. The environment compiles the combined MJCF model from walker, arena, and props into a MuJoCo physics simulation, initializes the observation system, and provides the standard dm_env interface for RL training.

Key considerations:

Time limits are typically 20-30 seconds for locomotion tasks
strip_singleton_obs_buffer_dim simplifies observation shapes for standard RL
Random state seeding ensures reproducible episode sequences
The environment handles automatic recompilation when MJCF models change between episodes
Example factory functions in basic_cmu_2019 and basic_rodent_2020 provide reference configurations

Step 6: Visualize and Debug

Use the interactive viewer or the explore.py scripts to visualize the locomotion environment. This enables visual inspection of walker behavior, terrain layout, and reward signals. The viewer supports camera control, body perturbation, and speed adjustment for debugging locomotion policies.

Key considerations:

dm_control.viewer.launch accepts an environment_loader for deferred construction
locomotion/examples/explore.py provides a ready-to-run visualization script
The viewer HUD displays simulation time, step count, and reward information
Body perturbation allows testing walker stability by applying external forces

Execution Diagram

GitHub URL

Workflow Repository