Implementation:Google deepmind Dm control Reference Pose Rewards
| Knowledge Sources | |
|---|---|
| Domains | Robotics, Reinforcement Learning |
| Last Updated | 2026-02-15 04:00 GMT |
Overview
The Reference Pose Rewards module defines reward functions for motion capture reference pose tracking tasks, including the CoMic reward function from Hasenclever et al. (ICML 2020).
Description
This module provides several reward functions that return RewardFnOutput named tuples containing three fields: reward (scalar), debug (dictionary of debug information), and reward_terms (ordered dictionary of individual reward components). Each reward function compares current walker features against reference features from motion capture data.
The termination_reward_fn computes a reward in [0, 1] based on the ratio of termination error to the termination threshold: 1 - error / threshold. The multi_term_pose_reward_fn computes a weighted sum of four exponentially-decayed squared difference terms: center of mass (weight 0.1, scale 10), joints velocity (weight 1.0, scale 0.1), appendages (weight 0.15, scale 40), and body quaternions (weight 0.65, scale 2). The comic_reward_fn combines both reward functions with equal 50/50 weighting, as used in the CoMic paper.
Helper functions include bounded_quat_dist for computing quaternion distances capped at pi/2 (supporting batched inputs), and compute_squared_differences which handles both regular Euclidean differences and quaternion distances based on field names. A registry system (_REWARD_FN and _REWARD_CHANNELS) enables lookup by string key via get_reward and get_reward_channels.
Usage
Use these reward functions within reference pose tracking tasks. Select a reward function by string key ('termination_reward', 'multi_term_pose_reward', or 'comic') via get_reward(). The 'comic' reward is recommended for general motion tracking as it balances termination-based and pose-based signals.
Code Reference
Source Location
- Repository: Google_deepmind_Dm_control
- File: dm_control/locomotion/tasks/reference_pose/rewards.py
- Lines: 1-187
Signature
RewardFnOutput = collections.namedtuple(
'RewardFnOutput', ['reward', 'debug', 'reward_terms'])
def bounded_quat_dist(source: np.ndarray,
target: np.ndarray) -> np.ndarray:
def compute_squared_differences(walker_features, reference_features,
exclude_keys=()):
def termination_reward_fn(termination_error, termination_error_threshold,
**unused_kwargs):
def multi_term_pose_reward_fn(walker_features, reference_features,
**unused_kwargs):
def comic_reward_fn(termination_error, termination_error_threshold,
walker_features, reference_features, **unused_kwargs):
def get_reward(reward_key):
def get_reward_channels(reward_key):
Import
from dm_control.locomotion.tasks.reference_pose import rewards
from dm_control.locomotion.tasks.reference_pose.rewards import get_reward
from dm_control.locomotion.tasks.reference_pose.rewards import get_reward_channels
I/O Contract
Inputs (comic_reward_fn)
| Name | Type | Required | Description |
|---|---|---|---|
| termination_error | float | Yes | The computed termination error from the tracking task |
| termination_error_threshold | float | Yes | Threshold used to normalize the termination error to [0, 1] |
| walker_features | dict | Yes | Dictionary of current walker feature arrays (position, quaternion, joints, etc.) |
| reference_features | dict | Yes | Dictionary of reference pose feature arrays from motion capture data |
Inputs (bounded_quat_dist)
| Name | Type | Required | Description |
|---|---|---|---|
| source | np.ndarray | Yes | Source quaternion(s), shape (B, 4) |
| target | np.ndarray | Yes | Target quaternion(s), shape (B, 4) |
Outputs
| Name | Type | Description |
|---|---|---|
| RewardFnOutput.reward | float | Scalar reward value |
| RewardFnOutput.debug | dict | Dictionary of debug terms for logging and analysis |
| RewardFnOutput.reward_terms | OrderedDict | Sorted dictionary of individual reward component values |
| bounded_quat_dist return | np.ndarray | Quaternion distance in [0, pi/2], shape (B, 1) |
Usage Examples
from dm_control.locomotion.tasks.reference_pose import rewards
# Look up a reward function by key
reward_fn = rewards.get_reward('comic')
reward_channels = rewards.get_reward_channels('comic')
# Compute the reward
result = reward_fn(
termination_error=0.5,
termination_error_threshold=1.0,
walker_features=current_features,
reference_features=reference_features,
)
print(result.reward) # scalar reward
print(result.debug) # debug info dictionary
print(result.reward_terms) # individual reward components
# Use bounded quaternion distance directly
from dm_control.locomotion.tasks.reference_pose.rewards import bounded_quat_dist
import numpy as np
quat_a = np.array([[1, 0, 0, 0]])
quat_b = np.array([[0.707, 0.707, 0, 0]])
dist = bounded_quat_dist(quat_a, quat_b)