Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium Vector NormalizeReward

From Leeroopedia
Revision as of 12:38, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Farama_Foundation_Gymnasium_Vector_NormalizeReward.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Reinforcement_Learning, Wrappers
Last Updated 2026-02-15 03:00 GMT

Overview

A vector reward wrapper that normalizes rewards across all sub-environments so that their exponential moving average has an approximately fixed variance, using a shared RunningMeanStd tracker.

Description

The NormalizeReward vector wrapper scales rewards from a vectorized environment to stabilize training. It maintains:

  • An accumulated_reward array (one per sub-environment) that tracks the discounted cumulative reward: accumulated_reward = accumulated_reward * gamma * (1 - terminated) + reward
  • A shared RunningMeanStd tracker updated with the accumulated rewards from all sub-environments.

The normalization divides the raw reward by the running standard deviation: reward / sqrt(var + epsilon)

Key features:

  • Per-environment tracking -- Each sub-environment has its own accumulated reward that resets on termination.
  • Shared statistics -- The RunningMeanStd is updated with accumulated rewards from all sub-environments together.
  • Freeze support -- The update_running_mean property can be set to False to freeze statistics during evaluation.
  • Separate normalize method -- The normalize(reward) method can be called independently for custom normalization needs.

Usage

Use this wrapper when training with vectorized environments and rewards have high variance. The shared statistics across sub-environments provide more stable normalization compared to the single-environment version.

Code Reference

Source Location

Signature

class NormalizeReward(VectorWrapper, gym.utils.RecordConstructorArgs):
    def __init__(
        self,
        env: VectorEnv,
        gamma: float = 0.99,
        epsilon: float = 1e-8,
    ): ...

Import

from gymnasium.wrappers.vector import NormalizeReward

I/O Contract

Inputs

Name Type Required Description
env VectorEnv Yes The vector environment to wrap
gamma float No Discount factor for the exponential moving average (default 0.99)
epsilon float No Stability parameter for normalization (default 1e-8)

Outputs

Name Type Description
observations ObsType Unchanged observations from the vector environment
rewards ArrayType Normalized rewards (divided by running standard deviation)
terminations ArrayType Unchanged termination flags
truncations ArrayType Unchanged truncation flags
info dict Unchanged info from the vector environment

Usage Examples

import numpy as np
import gymnasium as gym
from gymnasium.wrappers.vector import NormalizeReward

envs = gym.make_vec("MountainCarContinuous-v0", num_envs=3)
envs = NormalizeReward(envs)
_ = envs.reset(seed=123)
_ = envs.action_space.seed(123)

episode_rewards = []
for _ in range(100):
    observation, reward, *_ = envs.step(envs.action_space.sample())
    episode_rewards.append(reward)

envs.close()
# Rewards are now normalized to have approximately fixed variance

# Freeze statistics for evaluation
envs.update_running_mean = False

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment