Implementation:Farama Foundation Gymnasium Stateful Observation Wrappers

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, Wrappers
Last Updated	2026-02-15 03:00 GMT

Overview

A collection of stateful observation wrappers that maintain internal state to transform observations returned by Gymnasium environments, including DelayObservation, TimeAwareObservation, FrameStackObservation, NormalizeObservation, and MaxAndSkipObservation.

Description

This module provides five observation wrappers that each maintain internal state across steps:

DelayObservation -- Delays the returned observation by a configurable number of timesteps. Before reaching the delay count, a zero-valued observation is returned. Uses an internal deque to buffer observations.

TimeAwareObservation -- Augments observations with the current timestep count within an episode. Supports normalized time (float in [0,1]) or raw integer timesteps. Handles Dict, Tuple, and Box observation spaces, with optional flattening.

FrameStackObservation -- Stacks the last N observations in a rolling manner. Supports configurable padding: "reset" (repeats the reset observation), "zero" (zero-filled), or a custom observation value. Uses a deque internally.

NormalizeObservation -- Normalizes observations to be centered at mean with unit variance using a RunningMeanStd tracker. The running mean can be frozen (e.g., during evaluation) via the update_running_mean property. Outputs float32 observations.

MaxAndSkipObservation -- Implements frame skipping by repeating the same action for N steps and returning the element-wise max of the last two frames. Accumulates total reward across skipped frames.

Usage

Use these wrappers when you need observation preprocessing that depends on historical state: delayed observations for partial observability experiments, time-aware observations for time-sensitive policies, frame stacking for temporal context (common in Atari), observation normalization for stable training, or frame skipping for computational efficiency.

Code Reference

Source Location

Repository: Farama_Foundation_Gymnasium
File: gymnasium/wrappers/stateful_observation.py

Signature

class DelayObservation(gym.ObservationWrapper[ObsType, ActType, ObsType], gym.utils.RecordConstructorArgs):
    def __init__(self, env: gym.Env[ObsType, ActType], delay: int): ...

class TimeAwareObservation(gym.ObservationWrapper[WrapperObsType, ActType, ObsType], gym.utils.RecordConstructorArgs):
    def __init__(self, env: gym.Env[ObsType, ActType], flatten: bool = True, normalize_time: bool = False, *, dict_time_key: str = "time"): ...

class FrameStackObservation(gym.Wrapper[WrapperObsType, ActType, ObsType, ActType], gym.utils.RecordConstructorArgs):
    def __init__(self, env: gym.Env[ObsType, ActType], stack_size: int, *, padding_type: str | ObsType = "reset"): ...

class NormalizeObservation(gym.ObservationWrapper[WrapperObsType, ActType, ObsType], gym.utils.RecordConstructorArgs):
    def __init__(self, env: gym.Env[ObsType, ActType], epsilon: float = 1e-8): ...

class MaxAndSkipObservation(gym.Wrapper[WrapperObsType, ActType, ObsType, ActType], gym.utils.RecordConstructorArgs):
    def __init__(self, env: gym.Env[ObsType, ActType], skip: int = 4): ...

Import

from gymnasium.wrappers import DelayObservation, TimeAwareObservation, FrameStackObservation, NormalizeObservation, MaxAndSkipObservation

I/O Contract

Inputs

Name	Type	Required	Description
env	Env	Yes	The environment to wrap
delay	int	Yes (DelayObservation)	Number of timesteps to delay observations
flatten	bool	No (TimeAwareObservation)	Whether to flatten the observation to a single Box (default True)
normalize_time	bool	No (TimeAwareObservation)	If True, return time in [0,1] range (default False)
stack_size	int	Yes (FrameStackObservation)	Number of frames to stack
padding_type	str or ObsType	No (FrameStackObservation)	Padding strategy: "reset", "zero", or custom observation (default "reset")
epsilon	float	No (NormalizeObservation)	Stability parameter for normalization (default 1e-8)
skip	int	No (MaxAndSkipObservation)	Number of frames to skip (default 4)

Outputs

Name	Type	Description
observation	varies	Transformed observation (delayed, time-augmented, stacked, normalized, or max-pooled)
reward	float	Reward from the environment (summed across skipped frames for MaxAndSkipObservation)
terminated	bool	Whether the episode has terminated
truncated	bool	Whether the episode has been truncated
info	dict	Additional information from the environment

Usage Examples

import gymnasium as gym
from gymnasium.wrappers import DelayObservation, FrameStackObservation, NormalizeObservation, MaxAndSkipObservation

# DelayObservation: delay observations by 2 steps
env = gym.make("CartPole-v1")
env = DelayObservation(env, delay=2)
obs, info = env.reset(seed=123)
# obs is zero-valued until delay is reached

# FrameStackObservation: stack 4 frames
env = gym.make("CarRacing-v3")
env = FrameStackObservation(env, stack_size=4)
obs, _ = env.reset()
# obs.shape == (4, 96, 96, 3)

# NormalizeObservation: normalize observations to unit variance
env = gym.make("CartPole-v1")
env = NormalizeObservation(env)
obs, info = env.reset(seed=123)
# obs is now centered with unit variance

# MaxAndSkipObservation: skip every 4 frames
env = gym.make("CartPole-v1")
env = MaxAndSkipObservation(env, skip=4)
obs, reward, terminated, truncated, info = env.step(1)
# obs is the max of the last 2 frames; reward is summed across 4 steps

Related Pages

Environment:Farama_Foundation_Gymnasium_Python_3_10_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment