Principle:Haosulab ManiSkill Vectorized Environment Wrapping

Field	Value
principle_name	Haosulab_ManiSkill_Vectorized_Environment_Wrapping
overview	Wrapping simulation environments with auto-reset, action flattening, and recording for RL training compatibility
domains	Simulation, Reinforcement_Learning
last_updated	2026-02-15
related_pages	Implementation:Haosulab_ManiSkill_ManiSkillVectorEnv

Overview

Description

After a base simulation environment is created, it must be wrapped with additional functionality before it can be used by standard RL algorithms. Vectorized environment wrapping is a composable pattern where each wrapper adds a specific capability while preserving the Gymnasium interface contract. This principle covers three essential wrapping operations:

1. Vectorized Environment Wrapping (Auto-Reset)

The most critical wrapper converts a ManiSkill BaseEnv into a Gymnasium VectorEnv that supports automatic resetting. In standard RL training, when an environment episode terminates (due to success, failure, or time limit), the environment must be immediately reset so training can continue without interruption. The vectorized wrapper handles this transparently:

When any sub-environment signals terminated=True or truncated=True, the wrapper automatically calls reset() on just those sub-environments (partial reset)
The wrapper stores the final observation from the completed episode in info["final_observation"] so the RL algorithm can compute correct bootstrapped returns
Episode metrics (return, length, success rate) are tracked and reported through info["final_info"]

2. Action Space Flattening

Some ManiSkill environments use Dict action spaces (e.g., separate entries for arm and gripper control). Most RL algorithms expect a single Box action space. The action flattening wrapper concatenates all action components into a single continuous vector, mapping between the flat representation and the structured dictionary representation transparently.

3. Episode Recording

For evaluation and debugging, a recording wrapper can capture video frames and save trajectory data. This is typically applied to evaluation environments and optionally to training environments at a lower frequency.

Usage

Apply vectorized environment wrappers after creating the base environment with gym.make() and before passing the environment to the RL algorithm. The standard wrapping order is:

Create base environment via gym.make()
Apply FlattenActionSpaceWrapper if the action space is a Dict
Apply RecordEpisode for video/trajectory recording (optional)
Apply ManiSkillVectorEnv for auto-reset and metric tracking

This order matters because ManiSkillVectorEnv should be the outermost wrapper (the one the RL algorithm interacts with directly), while inner wrappers modify the environment's spaces or data before vectorization.

Theoretical Basis

Gymnasium VectorEnv API: The Gymnasium library defines a formal VectorEnv interface for batched environments. Unlike SubprocVecEnv (which uses multiprocessing), ManiSkill's vector wrapper operates on a single-process GPU-parallel environment. The VectorEnv API specifies:

single_observation_space / single_action_space: Spaces for individual environments
observation_space / action_space: Batched spaces across all environments
Auto-reset semantics: When an environment terminates, it is automatically reset, and the final observation is stored in the info dict

Wrapper Composition Pattern: Following the Decorator design pattern, each wrapper inherits from gym.Wrapper (or gym.ActionWrapper, gym.ObservationWrapper) and delegates to the inner environment. This enables flexible composition:

env = gym.make(...)                        # BaseEnv
env = FlattenActionSpaceWrapper(env)       # Dict -> Box actions
env = RecordEpisode(env, ...)              # video recording
env = ManiSkillVectorEnv(env, ...)         # auto-reset + metrics

Partial Reset for Continuous Training: In vectorized environments, different sub-environments finish episodes at different times. Rather than resetting all environments when any finishes (which wastes simulation steps), partial reset only resets the completed sub-environments. This is essential for efficient GPU-parallel training because:

It avoids synchronization barriers across environments
It maximizes GPU utilization by keeping all environments active
It correctly handles the boundary between episodes for return computation

Ignore Terminations (Infinite Horizon): Some RL formulations treat tasks as infinite horizon, where the episode ends only due to a time limit (truncation), never due to task success/failure (termination). The ignore_terminations flag supports this by overriding terminated to always be False, while still recording success/failure metrics for monitoring.

Wrapper Composition for Different Use Cases
Use Case	FlattenAction	RecordEpisode	VectorEnv Config
Standard PPO Training	If Dict action space	Optional (low freq)	`auto_reset=True, ignore_terminations=False, record_metrics=True`
Infinite Horizon Training	If Dict action space	Optional	`auto_reset=True, ignore_terminations=True, record_metrics=True`
Evaluation	If Dict action space	Yes (always)	`auto_reset=True, ignore_terminations=False, record_metrics=True`
Trajectory Collection	If Dict action space	Yes (save_trajectory=True)	`auto_reset=True, record_metrics=True`

Related Pages

Implementation:Haosulab_ManiSkill_ManiSkillVectorEnv -- Concrete implementation of the vectorized wrapping
Principle:Haosulab_ManiSkill_Environment_Configuration -- The preceding step: creating the base environment
Principle:Haosulab_ManiSkill_GPU_Parallelized_Rollout -- How wrapped environments are used for data collection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment