Principle:Haosulab ManiSkill Environment Configuration
| Field | Value |
|---|---|
| principle_name | Haosulab_ManiSkill_Environment_Configuration |
| overview | Configuring GPU-parallelized physics simulation environments for reinforcement learning with observation modes, action spaces, and sim backend selection |
| domains | Simulation, Reinforcement_Learning, Robotics |
| last_updated | 2026-02-15 |
| related_pages | Implementation:Haosulab_ManiSkill_Gym_Make_BaseEnv |
Overview
Description
Environment configuration is the foundational step of any reinforcement learning training pipeline. In GPU-parallelized robotics simulation, environment configuration determines how the physics simulator is instantiated, what observations the agent perceives, how actions are parameterized, and how many parallel instances execute simultaneously.
The core principle is that a single API call should create a fully configured environment that conforms to the Gymnasium standard interface (reset, step, observation_space, action_space), while abstracting away the complexity of GPU physics acceleration, sensor rendering, and robot controller setup.
Key configuration dimensions include:
- Observation mode: Determines the structure and content of observations returned by the environment. Common modes include:
- state: Low-dimensional proprioceptive and task state vectors (fastest, suitable for MLP policies)
- rgbd: RGB-D camera images (requires rendering pipeline, suitable for CNN/ViT policies)
- pointcloud: 3D point cloud data (suitable for point-based architectures)
- sensor_data: Raw sensor output without post-processing
- Control mode: Defines the action parameterization for the robot:
- pd_joint_delta_pos: Delta joint position commands via PD controllers (most common for RL)
- pd_joint_pos: Absolute joint position targets
- pd_ee_delta_pos: End-effector Cartesian delta position control
- pd_ee_delta_pose: End-effector delta pose (position + orientation) control
- Simulation backend: Selects the physics engine backend:
- auto: Automatically selects
physx_cudawhennum_envs > 1, otherwisephysx_cpu - physx_cuda: GPU-accelerated PhysX for massively parallel simulation
- physx_cpu: CPU-based PhysX for single-environment debugging
- auto: Automatically selects
- Number of environments: The degree of parallelism. When
num_envs > 1, GPU simulation is triggered, enabling thousands of environments to execute in parallel on a single GPU.
- Reconfiguration frequency: Controls how often environment assets (objects, scenes) are randomized. A value of
0means no reconfiguration after initial setup; higher values trigger periodic asset randomization.
Usage
Use environment configuration when:
- Setting up a new RL training run and needing to specify the task, observation format, and degree of parallelism
- Switching between CPU debugging (single env) and GPU training (hundreds/thousands of envs)
- Choosing between state-based observations for fast MLP training versus visual observations for vision-based policies
- Selecting the appropriate robot control interface for the task at hand
The configuration step always precedes environment wrapping (vectorization, action flattening) and policy instantiation in the RL pipeline.
Theoretical Basis
The environment configuration principle builds on several foundational concepts:
Gymnasium API Standard (formerly OpenAI Gym): The de facto standard interface for RL environments, defining reset() -> (obs, info) and step(action) -> (obs, reward, terminated, truncated, info). All ManiSkill environments conform to this interface, ensuring compatibility with any RL library that supports Gymnasium.
Vectorized Environment Design: Rather than running multiple independent processes (as in traditional SubprocVecEnv approaches), GPU-parallelized environments run all instances in a single process on the GPU. This eliminates inter-process communication overhead and enables order-of-magnitude speedups. The key insight is that physics simulation operations (collision detection, constraint solving, integration) are naturally parallelizable across independent environment instances.
GPU Physics Acceleration: Modern physics engines like NVIDIA PhysX support GPU-accelerated simulation where the entire physics pipeline (broadphase, narrowphase, solver) runs on the GPU. This is distinct from CPU simulation and requires that all environment state (poses, velocities, forces) reside in GPU memory as PyTorch tensors.
Observation-Action Space Contracts: The Gymnasium spaces API (Box, Dict, Discrete) formally describes the shape, dtype, and bounds of observations and actions. This contract enables RL algorithms to automatically infer network input/output dimensions and apply appropriate normalization.
| Pattern | num_envs | obs_mode | sim_backend | Use Case |
|---|---|---|---|---|
| Fast MLP Training | 512-4096 | state | physx_cuda | Standard RL benchmarking |
| Visual Policy Training | 64-256 | rgbd | physx_cuda | Vision-based manipulation |
| Single-Env Debugging | 1 | state | physx_cpu | Development and debugging |
| Evaluation | 8-16 | state | physx_cuda | Policy evaluation with recording |
Related Pages
- Implementation:Haosulab_ManiSkill_Gym_Make_BaseEnv -- Concrete implementation using
gym.make()and theBaseEnv.__init__constructor - Principle:Haosulab_ManiSkill_Vectorized_Environment_Wrapping -- The next step after environment creation: wrapping for RL training
- Principle:Haosulab_ManiSkill_GPU_Parallelized_Rollout -- How configured environments are used for batched data collection
- Heuristic:Haosulab_ManiSkill_Num_Envs_Backend_Selection