Principle:Haosulab ManiSkill Environment Configuration

Field	Value
principle_name	Haosulab_ManiSkill_Environment_Configuration
overview	Configuring GPU-parallelized physics simulation environments for reinforcement learning with observation modes, action spaces, and sim backend selection
domains	Simulation, Reinforcement_Learning, Robotics
last_updated	2026-02-15
related_pages	Implementation:Haosulab_ManiSkill_Gym_Make_BaseEnv

Overview

Description

Environment configuration is the foundational step of any reinforcement learning training pipeline. In GPU-parallelized robotics simulation, environment configuration determines how the physics simulator is instantiated, what observations the agent perceives, how actions are parameterized, and how many parallel instances execute simultaneously.

The core principle is that a single API call should create a fully configured environment that conforms to the Gymnasium standard interface (reset, step, observation_space, action_space), while abstracting away the complexity of GPU physics acceleration, sensor rendering, and robot controller setup.

Key configuration dimensions include:

Observation mode: Determines the structure and content of observations returned by the environment. Common modes include:
- state: Low-dimensional proprioceptive and task state vectors (fastest, suitable for MLP policies)
- rgbd: RGB-D camera images (requires rendering pipeline, suitable for CNN/ViT policies)
- pointcloud: 3D point cloud data (suitable for point-based architectures)
- sensor_data: Raw sensor output without post-processing

Control mode: Defines the action parameterization for the robot:
- pd_joint_delta_pos: Delta joint position commands via PD controllers (most common for RL)
- pd_joint_pos: Absolute joint position targets
- pd_ee_delta_pos: End-effector Cartesian delta position control
- pd_ee_delta_pose: End-effector delta pose (position + orientation) control

Simulation backend: Selects the physics engine backend:
- auto: Automatically selects physx_cuda when num_envs > 1, otherwise physx_cpu
- physx_cuda: GPU-accelerated PhysX for massively parallel simulation
- physx_cpu: CPU-based PhysX for single-environment debugging

Number of environments: The degree of parallelism. When num_envs > 1, GPU simulation is triggered, enabling thousands of environments to execute in parallel on a single GPU.

Reconfiguration frequency: Controls how often environment assets (objects, scenes) are randomized. A value of 0 means no reconfiguration after initial setup; higher values trigger periodic asset randomization.

Usage

Use environment configuration when:

Setting up a new RL training run and needing to specify the task, observation format, and degree of parallelism
Switching between CPU debugging (single env) and GPU training (hundreds/thousands of envs)
Choosing between state-based observations for fast MLP training versus visual observations for vision-based policies
Selecting the appropriate robot control interface for the task at hand

The configuration step always precedes environment wrapping (vectorization, action flattening) and policy instantiation in the RL pipeline.

Theoretical Basis

The environment configuration principle builds on several foundational concepts:

Gymnasium API Standard (formerly OpenAI Gym): The de facto standard interface for RL environments, defining reset() -> (obs, info) and step(action) -> (obs, reward, terminated, truncated, info). All ManiSkill environments conform to this interface, ensuring compatibility with any RL library that supports Gymnasium.

Vectorized Environment Design: Rather than running multiple independent processes (as in traditional SubprocVecEnv approaches), GPU-parallelized environments run all instances in a single process on the GPU. This eliminates inter-process communication overhead and enables order-of-magnitude speedups. The key insight is that physics simulation operations (collision detection, constraint solving, integration) are naturally parallelizable across independent environment instances.

GPU Physics Acceleration: Modern physics engines like NVIDIA PhysX support GPU-accelerated simulation where the entire physics pipeline (broadphase, narrowphase, solver) runs on the GPU. This is distinct from CPU simulation and requires that all environment state (poses, velocities, forces) reside in GPU memory as PyTorch tensors.

Observation-Action Space Contracts: The Gymnasium spaces API (Box, Dict, Discrete) formally describes the shape, dtype, and bounds of observations and actions. This contract enables RL algorithms to automatically infer network input/output dimensions and apply appropriate normalization.

Common Configuration Patterns for RL Training
Pattern	num_envs	obs_mode	sim_backend	Use Case
Fast MLP Training	512-4096	state	physx_cuda	Standard RL benchmarking
Visual Policy Training	64-256	rgbd	physx_cuda	Vision-based manipulation
Single-Env Debugging	1	state	physx_cpu	Development and debugging
Evaluation	8-16	state	physx_cuda	Policy evaluation with recording

Related Pages

Implementation:Haosulab_ManiSkill_Gym_Make_BaseEnv -- Concrete implementation using gym.make() and the BaseEnv.__init__ constructor
Principle:Haosulab_ManiSkill_Vectorized_Environment_Wrapping -- The next step after environment creation: wrapping for RL training
Principle:Haosulab_ManiSkill_GPU_Parallelized_Rollout -- How configured environments are used for batched data collection
Heuristic:Haosulab_ManiSkill_Num_Envs_Backend_Selection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment