Principle:Haosulab ManiSkill Vectorized Environment Wrapping
| Field | Value |
|---|---|
| principle_name | Haosulab_ManiSkill_Vectorized_Environment_Wrapping |
| overview | Wrapping simulation environments with auto-reset, action flattening, and recording for RL training compatibility |
| domains | Simulation, Reinforcement_Learning |
| last_updated | 2026-02-15 |
| related_pages | Implementation:Haosulab_ManiSkill_ManiSkillVectorEnv |
Overview
Description
After a base simulation environment is created, it must be wrapped with additional functionality before it can be used by standard RL algorithms. Vectorized environment wrapping is a composable pattern where each wrapper adds a specific capability while preserving the Gymnasium interface contract. This principle covers three essential wrapping operations:
1. Vectorized Environment Wrapping (Auto-Reset)
The most critical wrapper converts a ManiSkill BaseEnv into a Gymnasium VectorEnv that supports automatic resetting. In standard RL training, when an environment episode terminates (due to success, failure, or time limit), the environment must be immediately reset so training can continue without interruption. The vectorized wrapper handles this transparently:
- When any sub-environment signals
terminated=Trueortruncated=True, the wrapper automatically callsreset()on just those sub-environments (partial reset) - The wrapper stores the final observation from the completed episode in
info["final_observation"]so the RL algorithm can compute correct bootstrapped returns - Episode metrics (return, length, success rate) are tracked and reported through
info["final_info"]
2. Action Space Flattening
Some ManiSkill environments use Dict action spaces (e.g., separate entries for arm and gripper control). Most RL algorithms expect a single Box action space. The action flattening wrapper concatenates all action components into a single continuous vector, mapping between the flat representation and the structured dictionary representation transparently.
3. Episode Recording
For evaluation and debugging, a recording wrapper can capture video frames and save trajectory data. This is typically applied to evaluation environments and optionally to training environments at a lower frequency.
Usage
Apply vectorized environment wrappers after creating the base environment with gym.make() and before passing the environment to the RL algorithm. The standard wrapping order is:
- Create base environment via
gym.make() - Apply
FlattenActionSpaceWrapperif the action space is aDict - Apply
RecordEpisodefor video/trajectory recording (optional) - Apply
ManiSkillVectorEnvfor auto-reset and metric tracking
This order matters because ManiSkillVectorEnv should be the outermost wrapper (the one the RL algorithm interacts with directly), while inner wrappers modify the environment's spaces or data before vectorization.
Theoretical Basis
Gymnasium VectorEnv API: The Gymnasium library defines a formal VectorEnv interface for batched environments. Unlike SubprocVecEnv (which uses multiprocessing), ManiSkill's vector wrapper operates on a single-process GPU-parallel environment. The VectorEnv API specifies:
single_observation_space/single_action_space: Spaces for individual environmentsobservation_space/action_space: Batched spaces across all environments- Auto-reset semantics: When an environment terminates, it is automatically reset, and the final observation is stored in the info dict
Wrapper Composition Pattern: Following the Decorator design pattern, each wrapper inherits from gym.Wrapper (or gym.ActionWrapper, gym.ObservationWrapper) and delegates to the inner environment. This enables flexible composition:
env = gym.make(...) # BaseEnv
env = FlattenActionSpaceWrapper(env) # Dict -> Box actions
env = RecordEpisode(env, ...) # video recording
env = ManiSkillVectorEnv(env, ...) # auto-reset + metrics
Partial Reset for Continuous Training: In vectorized environments, different sub-environments finish episodes at different times. Rather than resetting all environments when any finishes (which wastes simulation steps), partial reset only resets the completed sub-environments. This is essential for efficient GPU-parallel training because:
- It avoids synchronization barriers across environments
- It maximizes GPU utilization by keeping all environments active
- It correctly handles the boundary between episodes for return computation
Ignore Terminations (Infinite Horizon): Some RL formulations treat tasks as infinite horizon, where the episode ends only due to a time limit (truncation), never due to task success/failure (termination). The ignore_terminations flag supports this by overriding terminated to always be False, while still recording success/failure metrics for monitoring.
| Use Case | FlattenAction | RecordEpisode | VectorEnv Config |
|---|---|---|---|
| Standard PPO Training | If Dict action space | Optional (low freq) | auto_reset=True, ignore_terminations=False, record_metrics=True
|
| Infinite Horizon Training | If Dict action space | Optional | auto_reset=True, ignore_terminations=True, record_metrics=True
|
| Evaluation | If Dict action space | Yes (always) | auto_reset=True, ignore_terminations=False, record_metrics=True
|
| Trajectory Collection | If Dict action space | Yes (save_trajectory=True) | auto_reset=True, record_metrics=True
|
Related Pages
- Implementation:Haosulab_ManiSkill_ManiSkillVectorEnv -- Concrete implementation of the vectorized wrapping
- Principle:Haosulab_ManiSkill_Environment_Configuration -- The preceding step: creating the base environment
- Principle:Haosulab_ManiSkill_GPU_Parallelized_Rollout -- How wrapped environments are used for data collection