Implementation:Farama Foundation Gymnasium NormalizeReward
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Wrappers |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
A reward wrapper that normalizes immediate rewards such that their exponential moving average has an approximately fixed variance, using a RunningMeanStd tracker with a configurable discount factor.
Description
The NormalizeReward wrapper scales rewards to stabilize training by maintaining an approximately fixed variance in the exponential moving average of returns. It works by:
1. Computing a discounted reward accumulator: discounted_reward = discounted_reward * gamma * (1 - terminated) + reward
2. Updating a RunningMeanStd tracker with the discounted reward values
3. Normalizing the raw reward by dividing by the standard deviation: reward / sqrt(var + epsilon)
Note that the wrapper intentionally does not subtract the mean (following the OpenAI Baselines convention).
The update_running_mean property allows freezing the running statistics during evaluation while still applying the learned normalization.
A vector version exists at gymnasium.wrappers.vector.NormalizeReward.
Usage
Use this wrapper when rewards have high variance or inconsistent scales across episodes, which is common in continuous control tasks. Freeze the running mean during evaluation by setting wrapper.update_running_mean = False.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File:
gymnasium/wrappers/stateful_reward.py
Signature
class NormalizeReward(gym.Wrapper[ObsType, ActType, ObsType, ActType], gym.utils.RecordConstructorArgs):
def __init__(
self,
env: gym.Env[ObsType, ActType],
gamma: float = 0.99,
epsilon: float = 1e-8,
): ...
Import
from gymnasium.wrappers import NormalizeReward
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| env | Env | Yes | The environment to wrap |
| gamma | float | No | Discount factor for the exponential moving average (default 0.99) |
| epsilon | float | No | Stability parameter for normalization (default 1e-8) |
Outputs
| Name | Type | Description |
|---|---|---|
| observation | ObsType | Unchanged observation from the environment |
| reward | float | Normalized reward (divided by running standard deviation) |
| terminated | bool | Whether the episode terminated |
| truncated | bool | Whether the episode was truncated |
| info | dict | Additional environment information |
Usage Examples
import numpy as np
import gymnasium as gym
from gymnasium.wrappers import NormalizeReward
env = gym.make("MountainCarContinuous-v0")
env = NormalizeReward(env, gamma=0.99, epsilon=1e-8)
_ = env.reset(seed=123)
_ = env.action_space.seed(123)
episode_rewards = []
terminated, truncated = False, False
while not (terminated or truncated):
observation, reward, terminated, truncated, info = env.step(env.action_space.sample())
episode_rewards.append(reward)
env.close()
# Rewards are now normalized to have approximately fixed variance
# Freeze running mean for evaluation
env.update_running_mean = False