Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Haosulab ManiSkill Vectorized Environment Wrapping

From Leeroopedia
Field Value
principle_name Haosulab_ManiSkill_Vectorized_Environment_Wrapping
overview Wrapping simulation environments with auto-reset, action flattening, and recording for RL training compatibility
domains Simulation, Reinforcement_Learning
last_updated 2026-02-15
related_pages Implementation:Haosulab_ManiSkill_ManiSkillVectorEnv

Overview

Description

After a base simulation environment is created, it must be wrapped with additional functionality before it can be used by standard RL algorithms. Vectorized environment wrapping is a composable pattern where each wrapper adds a specific capability while preserving the Gymnasium interface contract. This principle covers three essential wrapping operations:

1. Vectorized Environment Wrapping (Auto-Reset)

The most critical wrapper converts a ManiSkill BaseEnv into a Gymnasium VectorEnv that supports automatic resetting. In standard RL training, when an environment episode terminates (due to success, failure, or time limit), the environment must be immediately reset so training can continue without interruption. The vectorized wrapper handles this transparently:

  • When any sub-environment signals terminated=True or truncated=True, the wrapper automatically calls reset() on just those sub-environments (partial reset)
  • The wrapper stores the final observation from the completed episode in info["final_observation"] so the RL algorithm can compute correct bootstrapped returns
  • Episode metrics (return, length, success rate) are tracked and reported through info["final_info"]

2. Action Space Flattening

Some ManiSkill environments use Dict action spaces (e.g., separate entries for arm and gripper control). Most RL algorithms expect a single Box action space. The action flattening wrapper concatenates all action components into a single continuous vector, mapping between the flat representation and the structured dictionary representation transparently.

3. Episode Recording

For evaluation and debugging, a recording wrapper can capture video frames and save trajectory data. This is typically applied to evaluation environments and optionally to training environments at a lower frequency.

Usage

Apply vectorized environment wrappers after creating the base environment with gym.make() and before passing the environment to the RL algorithm. The standard wrapping order is:

  1. Create base environment via gym.make()
  2. Apply FlattenActionSpaceWrapper if the action space is a Dict
  3. Apply RecordEpisode for video/trajectory recording (optional)
  4. Apply ManiSkillVectorEnv for auto-reset and metric tracking

This order matters because ManiSkillVectorEnv should be the outermost wrapper (the one the RL algorithm interacts with directly), while inner wrappers modify the environment's spaces or data before vectorization.

Theoretical Basis

Gymnasium VectorEnv API: The Gymnasium library defines a formal VectorEnv interface for batched environments. Unlike SubprocVecEnv (which uses multiprocessing), ManiSkill's vector wrapper operates on a single-process GPU-parallel environment. The VectorEnv API specifies:

  • single_observation_space / single_action_space: Spaces for individual environments
  • observation_space / action_space: Batched spaces across all environments
  • Auto-reset semantics: When an environment terminates, it is automatically reset, and the final observation is stored in the info dict

Wrapper Composition Pattern: Following the Decorator design pattern, each wrapper inherits from gym.Wrapper (or gym.ActionWrapper, gym.ObservationWrapper) and delegates to the inner environment. This enables flexible composition:

env = gym.make(...)                        # BaseEnv
env = FlattenActionSpaceWrapper(env)       # Dict -> Box actions
env = RecordEpisode(env, ...)              # video recording
env = ManiSkillVectorEnv(env, ...)         # auto-reset + metrics

Partial Reset for Continuous Training: In vectorized environments, different sub-environments finish episodes at different times. Rather than resetting all environments when any finishes (which wastes simulation steps), partial reset only resets the completed sub-environments. This is essential for efficient GPU-parallel training because:

  • It avoids synchronization barriers across environments
  • It maximizes GPU utilization by keeping all environments active
  • It correctly handles the boundary between episodes for return computation

Ignore Terminations (Infinite Horizon): Some RL formulations treat tasks as infinite horizon, where the episode ends only due to a time limit (truncation), never due to task success/failure (termination). The ignore_terminations flag supports this by overriding terminated to always be False, while still recording success/failure metrics for monitoring.

Wrapper Composition for Different Use Cases
Use Case FlattenAction RecordEpisode VectorEnv Config
Standard PPO Training If Dict action space Optional (low freq) auto_reset=True, ignore_terminations=False, record_metrics=True
Infinite Horizon Training If Dict action space Optional auto_reset=True, ignore_terminations=True, record_metrics=True
Evaluation If Dict action space Yes (always) auto_reset=True, ignore_terminations=False, record_metrics=True
Trajectory Collection If Dict action space Yes (save_trajectory=True) auto_reset=True, record_metrics=True

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment