Implementation:Farama Foundation Gymnasium Vector NormalizeReward

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, Wrappers
Last Updated	2026-02-15 03:00 GMT

Overview

A vector reward wrapper that normalizes rewards across all sub-environments so that their exponential moving average has an approximately fixed variance, using a shared RunningMeanStd tracker.

Description

The NormalizeReward vector wrapper scales rewards from a vectorized environment to stabilize training. It maintains:

An accumulated_reward array (one per sub-environment) that tracks the discounted cumulative reward: accumulated_reward = accumulated_reward * gamma * (1 - terminated) + reward
A shared RunningMeanStd tracker updated with the accumulated rewards from all sub-environments.

The normalization divides the raw reward by the running standard deviation: reward / sqrt(var + epsilon)

Key features:

Per-environment tracking -- Each sub-environment has its own accumulated reward that resets on termination.
Shared statistics -- The RunningMeanStd is updated with accumulated rewards from all sub-environments together.
Freeze support -- The update_running_mean property can be set to False to freeze statistics during evaluation.
Separate normalize method -- The normalize(reward) method can be called independently for custom normalization needs.

Usage

Use this wrapper when training with vectorized environments and rewards have high variance. The shared statistics across sub-environments provide more stable normalization compared to the single-environment version.

Code Reference

Source Location

Repository: Farama_Foundation_Gymnasium
File: gymnasium/wrappers/vector/stateful_reward.py

Signature

class NormalizeReward(VectorWrapper, gym.utils.RecordConstructorArgs):
    def __init__(
        self,
        env: VectorEnv,
        gamma: float = 0.99,
        epsilon: float = 1e-8,
    ): ...

Import

from gymnasium.wrappers.vector import NormalizeReward

I/O Contract

Inputs

Name	Type	Required	Description
env	VectorEnv	Yes	The vector environment to wrap
gamma	float	No	Discount factor for the exponential moving average (default 0.99)
epsilon	float	No	Stability parameter for normalization (default 1e-8)

Outputs

Name	Type	Description
observations	ObsType	Unchanged observations from the vector environment
rewards	ArrayType	Normalized rewards (divided by running standard deviation)
terminations	ArrayType	Unchanged termination flags
truncations	ArrayType	Unchanged truncation flags
info	dict	Unchanged info from the vector environment

Usage Examples

import numpy as np
import gymnasium as gym
from gymnasium.wrappers.vector import NormalizeReward

envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
envs = NormalizeReward(envs)
_ = envs.reset(seed=123)
_ = envs.action_space.seed(123)

episode_rewards = []
for _ in range(100):
    observation, reward, *_ = envs.step(envs.action_space.sample())
    episode_rewards.append(reward)

envs.close()
# Rewards are now normalized to have approximately fixed variance

# Freeze statistics for evaluation
envs.update_running_mean = False

Related Pages

Environment:Farama_Foundation_Gymnasium_Python_3_10_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment