Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Google deepmind Dm control Reference Pose Rewards

From Leeroopedia
Revision as of 12:43, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Google_deepmind_Dm_control_Reference_Pose_Rewards.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Robotics, Reinforcement Learning
Last Updated 2026-02-15 04:00 GMT

Overview

The Reference Pose Rewards module defines reward functions for motion capture reference pose tracking tasks, including the CoMic reward function from Hasenclever et al. (ICML 2020).

Description

This module provides several reward functions that return RewardFnOutput named tuples containing three fields: reward (scalar), debug (dictionary of debug information), and reward_terms (ordered dictionary of individual reward components). Each reward function compares current walker features against reference features from motion capture data.

The termination_reward_fn computes a reward in [0, 1] based on the ratio of termination error to the termination threshold: 1 - error / threshold. The multi_term_pose_reward_fn computes a weighted sum of four exponentially-decayed squared difference terms: center of mass (weight 0.1, scale 10), joints velocity (weight 1.0, scale 0.1), appendages (weight 0.15, scale 40), and body quaternions (weight 0.65, scale 2). The comic_reward_fn combines both reward functions with equal 50/50 weighting, as used in the CoMic paper.

Helper functions include bounded_quat_dist for computing quaternion distances capped at pi/2 (supporting batched inputs), and compute_squared_differences which handles both regular Euclidean differences and quaternion distances based on field names. A registry system (_REWARD_FN and _REWARD_CHANNELS) enables lookup by string key via get_reward and get_reward_channels.

Usage

Use these reward functions within reference pose tracking tasks. Select a reward function by string key ('termination_reward', 'multi_term_pose_reward', or 'comic') via get_reward(). The 'comic' reward is recommended for general motion tracking as it balances termination-based and pose-based signals.

Code Reference

Source Location

Signature

RewardFnOutput = collections.namedtuple(
    'RewardFnOutput', ['reward', 'debug', 'reward_terms'])

def bounded_quat_dist(source: np.ndarray,
                      target: np.ndarray) -> np.ndarray:

def compute_squared_differences(walker_features, reference_features,
                                exclude_keys=()):

def termination_reward_fn(termination_error, termination_error_threshold,
                          **unused_kwargs):

def multi_term_pose_reward_fn(walker_features, reference_features,
                              **unused_kwargs):

def comic_reward_fn(termination_error, termination_error_threshold,
                    walker_features, reference_features, **unused_kwargs):

def get_reward(reward_key):

def get_reward_channels(reward_key):

Import

from dm_control.locomotion.tasks.reference_pose import rewards
from dm_control.locomotion.tasks.reference_pose.rewards import get_reward
from dm_control.locomotion.tasks.reference_pose.rewards import get_reward_channels

I/O Contract

Inputs (comic_reward_fn)

Name Type Required Description
termination_error float Yes The computed termination error from the tracking task
termination_error_threshold float Yes Threshold used to normalize the termination error to [0, 1]
walker_features dict Yes Dictionary of current walker feature arrays (position, quaternion, joints, etc.)
reference_features dict Yes Dictionary of reference pose feature arrays from motion capture data

Inputs (bounded_quat_dist)

Name Type Required Description
source np.ndarray Yes Source quaternion(s), shape (B, 4)
target np.ndarray Yes Target quaternion(s), shape (B, 4)

Outputs

Name Type Description
RewardFnOutput.reward float Scalar reward value
RewardFnOutput.debug dict Dictionary of debug terms for logging and analysis
RewardFnOutput.reward_terms OrderedDict Sorted dictionary of individual reward component values
bounded_quat_dist return np.ndarray Quaternion distance in [0, pi/2], shape (B, 1)

Usage Examples

from dm_control.locomotion.tasks.reference_pose import rewards

# Look up a reward function by key
reward_fn = rewards.get_reward('comic')
reward_channels = rewards.get_reward_channels('comic')

# Compute the reward
result = reward_fn(
    termination_error=0.5,
    termination_error_threshold=1.0,
    walker_features=current_features,
    reference_features=reference_features,
)

print(result.reward)        # scalar reward
print(result.debug)         # debug info dictionary
print(result.reward_terms)  # individual reward components

# Use bounded quaternion distance directly
from dm_control.locomotion.tasks.reference_pose.rewards import bounded_quat_dist
import numpy as np

quat_a = np.array([[1, 0, 0, 0]])
quat_b = np.array([[0.707, 0.707, 0, 0]])
dist = bounded_quat_dist(quat_a, quat_b)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment