Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Haosulab ManiSkill Gymnasium Wrappers

From Leeroopedia
Knowledge Sources
Domains Robotics, Simulation, Reinforcement_Learning
Last Updated 2026-02-15 08:00 GMT

Overview

Gymnasium wrappers are composable environment modifiers that intercept and transform the step, reset, observation, or action interfaces of a simulation environment without altering the underlying task logic, enabling flexible adaptation to the requirements of different training algorithms.

Description

Reinforcement learning and imitation learning algorithms have diverse requirements for how they interact with environments. Some algorithms need frame stacking (providing a window of recent observations to handle partial observability). Others benefit from action repetition (executing the same action for multiple physics steps to reduce the effective decision frequency). Training efficiency can be improved by caching reset states so that expensive scene initialization is amortized. And GPU-parallelized environments may need to be wrapped to present a standard single-environment Gymnasium API.

The Gymnasium Wrappers principle addresses these needs through the Decorator pattern: each wrapper is a thin layer that wraps an existing environment, intercepts specific interface methods (step, reset, observation), and transforms their inputs or outputs before delegating to the wrapped environment. Wrappers can be composed by stacking them: FrameStack(ActionRepeat(CachedReset(env))). Because each wrapper implements the full Gymnasium interface, the wrapped environment is indistinguishable from an unwrapped one to the consuming algorithm.

This design avoids the combinatorial explosion that would result from baking these modifications into the task environments themselves. A single PickCube environment can be used with any combination of wrappers, and new wrappers can be added without modifying existing environments or other wrappers.

Usage

This principle applies whenever:

  • The training algorithm requires a modified observation format (stacked frames, flattened dictionaries) that differs from the environment's native output.
  • Action repetition is needed to reduce decision frequency and improve training sample efficiency.
  • Environment reset cost is high and caching pre-computed reset states can accelerate training throughput.
  • A GPU-parallelized vector environment must be presented as a standard single-environment Gymnasium interface for compatibility with existing training code.
  • Multiple independent modifications must be composed without altering the base environment.

Theoretical Basis

1. Decorator Pattern: Each wrapper inherits from gymnasium.Wrapper (or a subclass like ObservationWrapper) and overrides only the methods it needs to modify. The wrapper holds a reference to the wrapped environment and delegates all unmodified calls to it. This ensures perfect interface compatibility and arbitrary composability.

2. Action Repetition: The wrapper intercepts step() and calls the underlying environment's step() multiple times with the same action. Rewards are accumulated (summed) across the repeated steps, and termination/truncation flags are propagated immediately. This effectively reduces the agent's decision frequency by a factor equal to the repeat count, which can stabilize training for high-frequency control tasks and reduce computational cost.

3. Cached Reset: Instead of performing a full scene reset on every reset() call, the wrapper maintains a cache of pre-computed reset states. On the first call and periodically thereafter, it performs a full reset and stores the result. On subsequent resets, it retrieves a cached state, optionally with minor perturbations. This is particularly valuable when scene construction or object placement is computationally expensive. The cache respects a configurable maximum size and replacement policy.

4. Frame Stacking: The wrapper maintains a sliding window buffer of the most recent N observations. At each step, the new observation is appended to the buffer and the oldest is discarded. The stacked observations are returned as the observation, with the observation space expanded to reflect the stacking dimension. This provides temporal context to policies operating on individual frames, which is essential for velocity estimation from position-only observations and for handling partial observability.

5. CPU Gymnasium Compatibility: GPU-parallelized environments return batched tensors and use a vectorized API. The CPU Gym wrapper adapts this to the standard single-environment Gymnasium interface by selecting a single environment from the batch, converting tensors to NumPy arrays, and handling the observation space transformation. This enables using GPU-accelerated environments with training code that expects the standard Gymnasium API.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment