Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Google deepmind Dm control Tolerance Reward

From Leeroopedia
Metadata
Knowledge Sources dm_control
Domains Reinforcement Learning, Reward Engineering, Control Theory
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete tool for computing smooth, bounded reward values using the rewards.tolerance() function, which maps a scalar or array distance metric to the range [0, 1] via configurable sigmoid decay.

Description

The tolerance() function in dm_control/utils/rewards.py is the standard reward primitive used across dm_control manipulation tasks. It acts as a soft indicator function:

  • Returns 1.0 when the input x lies within the closed interval [lower, upper].
  • Returns a sigmoidally decaying value in (0, 1) when x is outside the bounds but within the margin region.
  • Returns a value approaching 0.0 as x moves far beyond the margin.

When margin == 0, the function becomes a hard indicator (1 inside bounds, 0 outside). When margin > 0, the distance from x to the nearest bound is normalised by the margin and passed through one of eight sigmoid functions: 'gaussian', 'hyperbolic', 'long_tail', 'reciprocal', 'cosine', 'linear', 'quadratic', or 'tanh_squared'.

The internal _sigmoids() helper computes a scale factor from value_at_margin so that the sigmoid evaluates to exactly value_at_margin at normalised distance 1.0 (i.e. at the margin boundary).

For scalar inputs, the return value is a Python float. For NumPy array inputs, the return value is a NumPy array of the same shape.

Usage in manipulation tasks: The Reach task computes the Euclidean distance between the hand's tool center point and the target, then calls rewards.tolerance(distance, bounds=(0, TARGET_RADIUS), margin=TARGET_RADIUS) with the default Gaussian sigmoid.

Usage

Any task that needs a dense reward signal from a distance metric can use tolerance(). It is the standard approach in dm_control for converting distances, angles, or velocities into reward scalars.

Code Reference

Attribute Value
Source Location dm_control/utils/rewards.py, lines 24--136
Signature tolerance(x, bounds=(0.0, 0.0), margin=0.0, sigmoid='gaussian', value_at_margin=0.1) -> float or np.ndarray
Import from dm_control.utils import rewards

I/O Contract

Inputs

Parameter Type Default Description
x float or np.ndarray (required) The value(s) to evaluate.
bounds tuple[float, float] (0.0, 0.0) Inclusive (lower, upper) interval where the reward is 1.0. Can be infinite for one-sided bounds.
margin float 0.0 Controls how steeply the reward decays outside the bounds. If 0, the function is a hard indicator.
sigmoid str 'gaussian' One of: 'gaussian', 'hyperbolic', 'long_tail', 'reciprocal', 'cosine', 'linear', 'quadratic', 'tanh_squared'.
value_at_margin float 0.1 The output value when the distance from x to the nearest bound equals margin. Must be in (0, 1) for most sigmoids.

Outputs

Return Type Description
float (if x is scalar) A value in [0.0, 1.0].
np.ndarray (if x is array) An array of the same shape with values in [0.0, 1.0].

Usage Examples

from dm_control.utils import rewards
import numpy as np

# Basic usage: exact target at 0, Gaussian decay with margin 1.
print(rewards.tolerance(0.0, bounds=(0, 0), margin=1.0))
# 1.0  (x is at the target)

print(rewards.tolerance(1.0, bounds=(0, 0), margin=1.0))
# 0.1  (x is exactly at the margin; equals value_at_margin)

print(rewards.tolerance(0.5, bounds=(0, 0), margin=1.0))
# ~0.456  (intermediate value, Gaussian decay)
from dm_control.utils import rewards
import numpy as np

# Interval bounds: reward is 1.0 anywhere in [0, 0.05], decays outside.
distance = 0.03
reward = rewards.tolerance(distance, bounds=(0, 0.05), margin=0.05)
print(reward)
# 1.0  (distance is within bounds)

distance = 0.08
reward = rewards.tolerance(distance, bounds=(0, 0.05), margin=0.05)
print(reward)
# ~0.456  (distance is 0.03 past the upper bound, margin is 0.05)
from dm_control.utils import rewards
import numpy as np

# Vectorised usage with different sigmoid types.
distances = np.array([0.0, 0.05, 0.1, 0.2, 0.5])
for sig in ['gaussian', 'long_tail', 'cosine']:
    r = rewards.tolerance(distances, bounds=(0, 0), margin=0.2, sigmoid=sig)
    print(f'{sig}: {np.round(r, 3)}')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment