Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Farama Foundation Gymnasium Vector Observation Transformation

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Observation_Processing
Last Updated 2026-02-15 03:00 GMT

Overview

Vectorized observation transformation wrappers apply functions to batched observations from multiple parallel environments, enabling efficient batch-level preprocessing.

Description

Vector observation transformation wrappers extend the single-environment observation transformation pattern to operate on batched data from vectorized environments. When running multiple environment instances in parallel (as is standard practice in modern RL training), observations arrive as batched arrays with a leading environment dimension. Vector observation wrappers process these batched observations efficiently, either by applying a vectorized function to the entire batch at once or by applying a single-observation function to each sub-environment's observation individually.

The wrapper system provides two levels of abstraction. TransformObservation accepts both a batch-level function (operating on the full batched observation array) and a single-observation function (operating on individual environment observations). VectorizeTransformObservation automatically lifts a single-observation wrapper class to operate in the vectorized setting by applying it element-wise across the batch. This design allows users to either write optimized batch-level transformations (for example, using vectorized NumPy operations) or reuse existing single-environment wrappers without modification.

The vectorized observation wrappers support common transformations including filtering specific keys from Dict observation spaces, reshaping observations, converting data types, applying arbitrary functions, and handling the complexities of autoreset mode where some environments in the batch may have just reset while others are mid-episode. The wrappers also properly handle observation space metadata, updating the space to reflect the transformation applied.

Usage

Use vector observation transformation wrappers when training with multiple parallel environments and you need to preprocess observations before they reach the learning algorithm. Use TransformObservation when you can write an efficient batch-level transformation function. Use VectorizeTransformObservation when you want to reuse an existing single-environment observation wrapper. Use FilterObservation to select specific keys from Dict observation spaces. Use DtypeObservation to convert observation data types (for example, from float64 to float32 for GPU compatibility).

Theoretical Basis

Vector observation transformation operates on batched observations. Given N parallel environments producing observations o1,o2,,oN, the batch observation is:

O=stack(o1,o2,,oN)N×d

Batch transformation applies a function to the entire batch:

O~=fbatch(O)

Element-wise transformation applies a function to each element:

o~i=fsingle(oi),i=1,,N

The vectorize pattern lifts single-environment wrappers:

class TransformObservation(VectorObservationWrapper):
    def __init__(self, env, func, observation_space=None):
        self.func = func

    def vector_observation(self, observation):
        """Apply batch-level transformation."""
        return self.func(observation)

class VectorizeTransformObservation(VectorObservationWrapper):
    """Lifts a single-env wrapper to vector setting."""
    def __init__(self, env, wrapper_class, **kwargs):
        # Create a single wrapper instance for space computation
        self.single_wrapper = wrapper_class(single_env, **kwargs)

    def vector_observation(self, observations):
        """Apply single-env transformation to each observation."""
        return concatenate([
            self.single_wrapper.observation(obs)
            for obs in iterate(observations)
        ])

Autoreset handling is a key consideration: when a sub-environment resets, its observation comes from the new episode. The wrapper must handle the mixed batch where some observations are from continuing episodes and others are from fresh resets:

def step(self, actions):
    obs, rewards, terms, truncs, infos = self.env.step(actions)
    obs = self.vector_observation(obs)
    # Also transform the final observation stored in info for autoreset envs
    if "final_observation" in infos:
        for i, final_obs in enumerate(infos["final_observation"]):
            if final_obs is not None:
                infos["final_observation"][i] = self.single_observation(final_obs)
    return obs, rewards, terms, truncs, infos

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment