Implementation:Google deepmind Dm control Tolerance Reward

Metadata
Knowledge Sources	dm_control
Domains	Reinforcement Learning, Reward Engineering, Control Theory
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete tool for computing smooth, bounded reward values using the rewards.tolerance() function, which maps a scalar or array distance metric to the range [0, 1] via configurable sigmoid decay.

Description

The tolerance() function in dm_control/utils/rewards.py is the standard reward primitive used across dm_control manipulation tasks. It acts as a soft indicator function:

Returns 1.0 when the input x lies within the closed interval [lower, upper].
Returns a sigmoidally decaying value in (0, 1) when x is outside the bounds but within the margin region.
Returns a value approaching 0.0 as x moves far beyond the margin.

When margin == 0, the function becomes a hard indicator (1 inside bounds, 0 outside). When margin > 0, the distance from x to the nearest bound is normalised by the margin and passed through one of eight sigmoid functions: 'gaussian', 'hyperbolic', 'long_tail', 'reciprocal', 'cosine', 'linear', 'quadratic', or 'tanh_squared'.

The internal _sigmoids() helper computes a scale factor from value_at_margin so that the sigmoid evaluates to exactly value_at_margin at normalised distance 1.0 (i.e. at the margin boundary).

For scalar inputs, the return value is a Python float. For NumPy array inputs, the return value is a NumPy array of the same shape.

Usage in manipulation tasks: The Reach task computes the Euclidean distance between the hand's tool center point and the target, then calls rewards.tolerance(distance, bounds=(0, TARGET_RADIUS), margin=TARGET_RADIUS) with the default Gaussian sigmoid.

Usage

Any task that needs a dense reward signal from a distance metric can use tolerance(). It is the standard approach in dm_control for converting distances, angles, or velocities into reward scalars.

Code Reference

Attribute	Value
Source Location	`dm_control/utils/rewards.py`, lines 24--136
Signature	`tolerance(x, bounds=(0.0, 0.0), margin=0.0, sigmoid='gaussian', value_at_margin=0.1) -> float or np.ndarray`
Import	`from dm_control.utils import rewards`

I/O Contract

Inputs

Parameter	Type	Default	Description
`x`	`float` or `np.ndarray`	(required)	The value(s) to evaluate.
`bounds`	`tuple[float, float]`	`(0.0, 0.0)`	Inclusive `(lower, upper)` interval where the reward is 1.0. Can be infinite for one-sided bounds.
`margin`	`float`	`0.0`	Controls how steeply the reward decays outside the bounds. If 0, the function is a hard indicator.
`sigmoid`	`str`	`'gaussian'`	One of: `'gaussian'`, `'hyperbolic'`, `'long_tail'`, `'reciprocal'`, `'cosine'`, `'linear'`, `'quadratic'`, `'tanh_squared'`.
`value_at_margin`	`float`	`0.1`	The output value when the distance from `x` to the nearest bound equals `margin`. Must be in `(0, 1)` for most sigmoids.

Outputs

Return Type	Description
`float` (if `x` is scalar)	A value in `[0.0, 1.0]`.
`np.ndarray` (if `x` is array)	An array of the same shape with values in `[0.0, 1.0]`.

Usage Examples

from dm_control.utils import rewards
import numpy as np

# Basic usage: exact target at 0, Gaussian decay with margin 1.
print(rewards.tolerance(0.0, bounds=(0, 0), margin=1.0))
# 1.0  (x is at the target)

print(rewards.tolerance(1.0, bounds=(0, 0), margin=1.0))
# 0.1  (x is exactly at the margin; equals value_at_margin)

print(rewards.tolerance(0.5, bounds=(0, 0), margin=1.0))
# ~0.456  (intermediate value, Gaussian decay)

from dm_control.utils import rewards
import numpy as np

# Interval bounds: reward is 1.0 anywhere in [0, 0.05], decays outside.
distance = 0.03
reward = rewards.tolerance(distance, bounds=(0, 0.05), margin=0.05)
print(reward)
# 1.0  (distance is within bounds)

distance = 0.08
reward = rewards.tolerance(distance, bounds=(0, 0.05), margin=0.05)
print(reward)
# ~0.456  (distance is 0.03 past the upper bound, margin is 0.05)

from dm_control.utils import rewards
import numpy as np

# Vectorised usage with different sigmoid types.
distances = np.array([0.0, 0.05, 0.1, 0.2, 0.5])
for sig in ['gaussian', 'long_tail', 'cosine']:
    r = rewards.tolerance(distances, bounds=(0, 0), margin=0.2, sigmoid=sig)
    print(f'{sig}: {np.round(r, 3)}')

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment