Principle:Google deepmind Dm control Reward Shaping

Metadata
Knowledge Sources	dm_control
Domains	Reinforcement Learning, Reward Engineering, Control Theory
Last Updated	2026-02-15 00:00 GMT

Overview

Reward shaping is the principle of designing smooth, bounded reward signals that provide informative gradients to a learning agent by mapping a continuous distance metric to a value between 0 and 1 using configurable sigmoid functions.

Description

In reinforcement learning, the reward function drives all learning. A naive binary reward (1 if the goal is reached, 0 otherwise) provides no gradient information when the agent is far from the goal, making exploration extremely difficult. Reward shaping addresses this by replacing the binary signal with a smooth function that:

Returns 1.0 when the agent's state falls within a specified target interval (the bounds).
Decays smoothly towards 0.0 as the agent moves away from the target, at a rate controlled by a margin parameter.
Uses a configurable sigmoid shape to control the decay profile.

The key design parameters are:

Bounds -- a pair (lower, upper) defining the interval within which the reward is maximal. When lower == upper, the target is an exact value.
Margin -- the distance from the bounds at which the reward drops to a specified reference value. A margin of 0 produces a hard threshold; a positive margin produces a smooth transition.
Sigmoid type -- the mathematical function used for the decay. Different sigmoids offer different trade-offs between tail behaviour and gradient strength.
Value at margin -- the reward value when the distance from the bounds exactly equals the margin, anchoring the sigmoid's scale.

Usage

Reward shaping is used in every manipulation task to convert a distance measure (e.g. Euclidean distance from the hand to a target) into a dense reward signal. It is also applicable outside manipulation, in any domain where a continuous metric exists between the current state and a goal state.

Theoretical Basis

The tolerance function is defined piecewise:

tolerance(x, bounds=(lower, upper), margin, sigmoid, value_at_margin):

    if lower <= x <= upper:
        return 1.0

    d = distance_to_nearest_bound(x) / margin

    return sigmoid_function(d, value_at_margin)

The sigmoid function maps the normalised distance d (where d = 1 at the margin) to a value in [0, 1]. The available sigmoids and their formulas are:

Sigmoid	Formula	Tail Behaviour
Gaussian	`exp(-0.5 * (d * scale)^2)`	Fast decay (exponential)
Hyperbolic	`1 / cosh(d * scale)`	Moderate decay
Long tail	`1 / ((d * scale)^2 + 1)`	Slow decay (polynomial)
Reciprocal	d\| * scale + 1)	Slow decay (linear denominator)
Cosine	`(1 + cos(pi * d * scale)) / 2`	Compact support (reaches 0)
Linear	`1 - d * scale`	Compact support (reaches 0)
Quadratic	`1 - (d * scale)^2`	Compact support (reaches 0)
Tanh squared	`1 - tanh(d * scale)^2`	Moderate decay

In each case, scale is derived from value_at_margin so that sigmoid_function(1, value_at_margin) = value_at_margin.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment