Implementation:Farama Foundation Gymnasium NormalizeReward

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, Wrappers
Last Updated	2026-02-15 03:00 GMT

Overview

A reward wrapper that normalizes immediate rewards such that their exponential moving average has an approximately fixed variance, using a RunningMeanStd tracker with a configurable discount factor.

Description

The NormalizeReward wrapper scales rewards to stabilize training by maintaining an approximately fixed variance in the exponential moving average of returns. It works by:

1. Computing a discounted reward accumulator: discounted_reward = discounted_reward * gamma * (1 - terminated) + reward 2. Updating a RunningMeanStd tracker with the discounted reward values 3. Normalizing the raw reward by dividing by the standard deviation: reward / sqrt(var + epsilon)

Note that the wrapper intentionally does not subtract the mean (following the OpenAI Baselines convention).

The update_running_mean property allows freezing the running statistics during evaluation while still applying the learned normalization.

A vector version exists at gymnasium.wrappers.vector.NormalizeReward.

Usage

Use this wrapper when rewards have high variance or inconsistent scales across episodes, which is common in continuous control tasks. Freeze the running mean during evaluation by setting wrapper.update_running_mean = False.

Code Reference

Source Location

Repository: Farama_Foundation_Gymnasium
File: gymnasium/wrappers/stateful_reward.py

Signature

class NormalizeReward(gym.Wrapper[ObsType, ActType, ObsType, ActType], gym.utils.RecordConstructorArgs):
    def __init__(
        self,
        env: gym.Env[ObsType, ActType],
        gamma: float = 0.99,
        epsilon: float = 1e-8,
    ): ...

Import

from gymnasium.wrappers import NormalizeReward

I/O Contract

Inputs

Name	Type	Required	Description
env	Env	Yes	The environment to wrap
gamma	float	No	Discount factor for the exponential moving average (default 0.99)
epsilon	float	No	Stability parameter for normalization (default 1e-8)

Outputs

Name	Type	Description
observation	ObsType	Unchanged observation from the environment
reward	float	Normalized reward (divided by running standard deviation)
terminated	bool	Whether the episode terminated
truncated	bool	Whether the episode was truncated
info	dict	Additional environment information

Usage Examples

import numpy as np
import gymnasium as gym
from gymnasium.wrappers import NormalizeReward

env = gym.make("MountainCarContinuous-v0")
env = NormalizeReward(env, gamma=0.99, epsilon=1e-8)
_ = env.reset(seed=123)
_ = env.action_space.seed(123)

episode_rewards = []
terminated, truncated = False, False
while not (terminated or truncated):
    observation, reward, terminated, truncated, info = env.step(env.action_space.sample())
    episode_rewards.append(reward)

env.close()
# Rewards are now normalized to have approximately fixed variance

# Freeze running mean for evaluation
env.update_running_mean = False

Related Pages

Environment:Farama_Foundation_Gymnasium_Python_3_10_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment