Implementation:Google deepmind Dm control Reference Pose Rewards

Knowledge Sources	Google_deepmind_Dm_control
Domains	Robotics, Reinforcement Learning
Last Updated	2026-02-15 04:00 GMT

Overview

The Reference Pose Rewards module defines reward functions for motion capture reference pose tracking tasks, including the CoMic reward function from Hasenclever et al. (ICML 2020).

Description

This module provides several reward functions that return RewardFnOutput named tuples containing three fields: reward (scalar), debug (dictionary of debug information), and reward_terms (ordered dictionary of individual reward components). Each reward function compares current walker features against reference features from motion capture data.

The termination_reward_fn computes a reward in [0, 1] based on the ratio of termination error to the termination threshold: 1 - error / threshold. The multi_term_pose_reward_fn computes a weighted sum of four exponentially-decayed squared difference terms: center of mass (weight 0.1, scale 10), joints velocity (weight 1.0, scale 0.1), appendages (weight 0.15, scale 40), and body quaternions (weight 0.65, scale 2). The comic_reward_fn combines both reward functions with equal 50/50 weighting, as used in the CoMic paper.

Helper functions include bounded_quat_dist for computing quaternion distances capped at pi/2 (supporting batched inputs), and compute_squared_differences which handles both regular Euclidean differences and quaternion distances based on field names. A registry system (_REWARD_FN and _REWARD_CHANNELS) enables lookup by string key via get_reward and get_reward_channels.

Usage

Use these reward functions within reference pose tracking tasks. Select a reward function by string key ('termination_reward', 'multi_term_pose_reward', or 'comic') via get_reward(). The 'comic' reward is recommended for general motion tracking as it balances termination-based and pose-based signals.

Code Reference

Source Location

Repository: Google_deepmind_Dm_control
File: dm_control/locomotion/tasks/reference_pose/rewards.py
Lines: 1-187

Signature

RewardFnOutput = collections.namedtuple(
    'RewardFnOutput', ['reward', 'debug', 'reward_terms'])

def bounded_quat_dist(source: np.ndarray,
                      target: np.ndarray) -> np.ndarray:

def compute_squared_differences(walker_features, reference_features,
                                exclude_keys=()):

def termination_reward_fn(termination_error, termination_error_threshold,
                          **unused_kwargs):

def multi_term_pose_reward_fn(walker_features, reference_features,
                              **unused_kwargs):

def comic_reward_fn(termination_error, termination_error_threshold,
                    walker_features, reference_features, **unused_kwargs):

def get_reward(reward_key):

def get_reward_channels(reward_key):

Import

from dm_control.locomotion.tasks.reference_pose import rewards
from dm_control.locomotion.tasks.reference_pose.rewards import get_reward
from dm_control.locomotion.tasks.reference_pose.rewards import get_reward_channels

I/O Contract

Inputs (comic_reward_fn)

Name	Type	Required	Description
termination_error	float	Yes	The computed termination error from the tracking task
termination_error_threshold	float	Yes	Threshold used to normalize the termination error to [0, 1]
walker_features	dict	Yes	Dictionary of current walker feature arrays (position, quaternion, joints, etc.)
reference_features	dict	Yes	Dictionary of reference pose feature arrays from motion capture data

Inputs (bounded_quat_dist)

Name	Type	Required	Description
source	np.ndarray	Yes	Source quaternion(s), shape (B, 4)
target	np.ndarray	Yes	Target quaternion(s), shape (B, 4)

Outputs

Name	Type	Description
RewardFnOutput.reward	float	Scalar reward value
RewardFnOutput.debug	dict	Dictionary of debug terms for logging and analysis
RewardFnOutput.reward_terms	OrderedDict	Sorted dictionary of individual reward component values
bounded_quat_dist return	np.ndarray	Quaternion distance in [0, pi/2], shape (B, 1)

Usage Examples

from dm_control.locomotion.tasks.reference_pose import rewards

# Look up a reward function by key
reward_fn = rewards.get_reward('comic')
reward_channels = rewards.get_reward_channels('comic')

# Compute the reward
result = reward_fn(
    termination_error=0.5,
    termination_error_threshold=1.0,
    walker_features=current_features,
    reference_features=reference_features,
)

print(result.reward)        # scalar reward
print(result.debug)         # debug info dictionary
print(result.reward_terms)  # individual reward components

# Use bounded quaternion distance directly
from dm_control.locomotion.tasks.reference_pose.rewards import bounded_quat_dist
import numpy as np

quat_a = np.array([[1, 0, 0, 0]])
quat_b = np.array([[0.707, 0.707, 0, 0]])
dist = bounded_quat_dist(quat_a, quat_b)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment