Implementation:Google deepmind Dm control Tolerance Reward
| Metadata | |
|---|---|
| Knowledge Sources | dm_control |
| Domains | Reinforcement Learning, Reward Engineering, Control Theory |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for computing smooth, bounded reward values using the rewards.tolerance() function, which maps a scalar or array distance metric to the range [0, 1] via configurable sigmoid decay.
Description
The tolerance() function in dm_control/utils/rewards.py is the standard reward primitive used across dm_control manipulation tasks. It acts as a soft indicator function:
- Returns 1.0 when the input
xlies within the closed interval[lower, upper]. - Returns a sigmoidally decaying value in
(0, 1)whenxis outside the bounds but within the margin region. - Returns a value approaching 0.0 as
xmoves far beyond the margin.
When margin == 0, the function becomes a hard indicator (1 inside bounds, 0 outside). When margin > 0, the distance from x to the nearest bound is normalised by the margin and passed through one of eight sigmoid functions: 'gaussian', 'hyperbolic', 'long_tail', 'reciprocal', 'cosine', 'linear', 'quadratic', or 'tanh_squared'.
The internal _sigmoids() helper computes a scale factor from value_at_margin so that the sigmoid evaluates to exactly value_at_margin at normalised distance 1.0 (i.e. at the margin boundary).
For scalar inputs, the return value is a Python float. For NumPy array inputs, the return value is a NumPy array of the same shape.
Usage in manipulation tasks: The Reach task computes the Euclidean distance between the hand's tool center point and the target, then calls rewards.tolerance(distance, bounds=(0, TARGET_RADIUS), margin=TARGET_RADIUS) with the default Gaussian sigmoid.
Usage
Any task that needs a dense reward signal from a distance metric can use tolerance(). It is the standard approach in dm_control for converting distances, angles, or velocities into reward scalars.
Code Reference
| Attribute | Value |
|---|---|
| Source Location | dm_control/utils/rewards.py, lines 24--136
|
| Signature | tolerance(x, bounds=(0.0, 0.0), margin=0.0, sigmoid='gaussian', value_at_margin=0.1) -> float or np.ndarray
|
| Import | from dm_control.utils import rewards
|
I/O Contract
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
x |
float or np.ndarray |
(required) | The value(s) to evaluate. |
bounds |
tuple[float, float] |
(0.0, 0.0) |
Inclusive (lower, upper) interval where the reward is 1.0. Can be infinite for one-sided bounds.
|
margin |
float |
0.0 |
Controls how steeply the reward decays outside the bounds. If 0, the function is a hard indicator. |
sigmoid |
str |
'gaussian' |
One of: 'gaussian', 'hyperbolic', 'long_tail', 'reciprocal', 'cosine', 'linear', 'quadratic', 'tanh_squared'.
|
value_at_margin |
float |
0.1 |
The output value when the distance from x to the nearest bound equals margin. Must be in (0, 1) for most sigmoids.
|
Outputs
| Return Type | Description |
|---|---|
float (if x is scalar) |
A value in [0.0, 1.0].
|
np.ndarray (if x is array) |
An array of the same shape with values in [0.0, 1.0].
|
Usage Examples
from dm_control.utils import rewards
import numpy as np
# Basic usage: exact target at 0, Gaussian decay with margin 1.
print(rewards.tolerance(0.0, bounds=(0, 0), margin=1.0))
# 1.0 (x is at the target)
print(rewards.tolerance(1.0, bounds=(0, 0), margin=1.0))
# 0.1 (x is exactly at the margin; equals value_at_margin)
print(rewards.tolerance(0.5, bounds=(0, 0), margin=1.0))
# ~0.456 (intermediate value, Gaussian decay)
from dm_control.utils import rewards
import numpy as np
# Interval bounds: reward is 1.0 anywhere in [0, 0.05], decays outside.
distance = 0.03
reward = rewards.tolerance(distance, bounds=(0, 0.05), margin=0.05)
print(reward)
# 1.0 (distance is within bounds)
distance = 0.08
reward = rewards.tolerance(distance, bounds=(0, 0.05), margin=0.05)
print(reward)
# ~0.456 (distance is 0.03 past the upper bound, margin is 0.05)
from dm_control.utils import rewards
import numpy as np
# Vectorised usage with different sigmoid types.
distances = np.array([0.0, 0.05, 0.1, 0.2, 0.5])
for sig in ['gaussian', 'long_tail', 'cosine']:
r = rewards.tolerance(distances, bounds=(0, 0), margin=0.2, sigmoid=sig)
print(f'{sig}: {np.round(r, 3)}')