Implementation:Farama Foundation Gymnasium Vector NormalizeReward
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Wrappers |
| Last Updated | 2026-02-15 03:00 GMT |
Overview
A vector reward wrapper that normalizes rewards across all sub-environments so that their exponential moving average has an approximately fixed variance, using a shared RunningMeanStd tracker.
Description
The NormalizeReward vector wrapper scales rewards from a vectorized environment to stabilize training. It maintains:
- An
accumulated_rewardarray (one per sub-environment) that tracks the discounted cumulative reward:accumulated_reward = accumulated_reward * gamma * (1 - terminated) + reward - A shared
RunningMeanStdtracker updated with the accumulated rewards from all sub-environments.
The normalization divides the raw reward by the running standard deviation: reward / sqrt(var + epsilon)
Key features:
- Per-environment tracking -- Each sub-environment has its own accumulated reward that resets on termination.
- Shared statistics -- The RunningMeanStd is updated with accumulated rewards from all sub-environments together.
- Freeze support -- The
update_running_meanproperty can be set to False to freeze statistics during evaluation. - Separate normalize method -- The
normalize(reward)method can be called independently for custom normalization needs.
Usage
Use this wrapper when training with vectorized environments and rewards have high variance. The shared statistics across sub-environments provide more stable normalization compared to the single-environment version.
Code Reference
Source Location
- Repository: Farama_Foundation_Gymnasium
- File:
gymnasium/wrappers/vector/stateful_reward.py
Signature
class NormalizeReward(VectorWrapper, gym.utils.RecordConstructorArgs):
def __init__(
self,
env: VectorEnv,
gamma: float = 0.99,
epsilon: float = 1e-8,
): ...
Import
from gymnasium.wrappers.vector import NormalizeReward
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| env | VectorEnv | Yes | The vector environment to wrap |
| gamma | float | No | Discount factor for the exponential moving average (default 0.99) |
| epsilon | float | No | Stability parameter for normalization (default 1e-8) |
Outputs
| Name | Type | Description |
|---|---|---|
| observations | ObsType | Unchanged observations from the vector environment |
| rewards | ArrayType | Normalized rewards (divided by running standard deviation) |
| terminations | ArrayType | Unchanged termination flags |
| truncations | ArrayType | Unchanged truncation flags |
| info | dict | Unchanged info from the vector environment |
Usage Examples
import numpy as np
import gymnasium as gym
from gymnasium.wrappers.vector import NormalizeReward
envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
envs = NormalizeReward(envs)
_ = envs.reset(seed=123)
_ = envs.action_space.seed(123)
episode_rewards = []
for _ in range(100):
observation, reward, *_ = envs.step(envs.action_space.sample())
episode_rewards.append(reward)
envs.close()
# Rewards are now normalized to have approximately fixed variance
# Freeze statistics for evaluation
envs.update_running_mean = False