Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Haosulab ManiSkill Environment Configuration

From Leeroopedia
Field Value
principle_name Haosulab_ManiSkill_Environment_Configuration
overview Configuring GPU-parallelized physics simulation environments for reinforcement learning with observation modes, action spaces, and sim backend selection
domains Simulation, Reinforcement_Learning, Robotics
last_updated 2026-02-15
related_pages Implementation:Haosulab_ManiSkill_Gym_Make_BaseEnv

Overview

Description

Environment configuration is the foundational step of any reinforcement learning training pipeline. In GPU-parallelized robotics simulation, environment configuration determines how the physics simulator is instantiated, what observations the agent perceives, how actions are parameterized, and how many parallel instances execute simultaneously.

The core principle is that a single API call should create a fully configured environment that conforms to the Gymnasium standard interface (reset, step, observation_space, action_space), while abstracting away the complexity of GPU physics acceleration, sensor rendering, and robot controller setup.

Key configuration dimensions include:

  • Observation mode: Determines the structure and content of observations returned by the environment. Common modes include:
    • state: Low-dimensional proprioceptive and task state vectors (fastest, suitable for MLP policies)
    • rgbd: RGB-D camera images (requires rendering pipeline, suitable for CNN/ViT policies)
    • pointcloud: 3D point cloud data (suitable for point-based architectures)
    • sensor_data: Raw sensor output without post-processing
  • Control mode: Defines the action parameterization for the robot:
    • pd_joint_delta_pos: Delta joint position commands via PD controllers (most common for RL)
    • pd_joint_pos: Absolute joint position targets
    • pd_ee_delta_pos: End-effector Cartesian delta position control
    • pd_ee_delta_pose: End-effector delta pose (position + orientation) control
  • Simulation backend: Selects the physics engine backend:
    • auto: Automatically selects physx_cuda when num_envs > 1, otherwise physx_cpu
    • physx_cuda: GPU-accelerated PhysX for massively parallel simulation
    • physx_cpu: CPU-based PhysX for single-environment debugging
  • Number of environments: The degree of parallelism. When num_envs > 1, GPU simulation is triggered, enabling thousands of environments to execute in parallel on a single GPU.
  • Reconfiguration frequency: Controls how often environment assets (objects, scenes) are randomized. A value of 0 means no reconfiguration after initial setup; higher values trigger periodic asset randomization.

Usage

Use environment configuration when:

  • Setting up a new RL training run and needing to specify the task, observation format, and degree of parallelism
  • Switching between CPU debugging (single env) and GPU training (hundreds/thousands of envs)
  • Choosing between state-based observations for fast MLP training versus visual observations for vision-based policies
  • Selecting the appropriate robot control interface for the task at hand

The configuration step always precedes environment wrapping (vectorization, action flattening) and policy instantiation in the RL pipeline.

Theoretical Basis

The environment configuration principle builds on several foundational concepts:

Gymnasium API Standard (formerly OpenAI Gym): The de facto standard interface for RL environments, defining reset() -> (obs, info) and step(action) -> (obs, reward, terminated, truncated, info). All ManiSkill environments conform to this interface, ensuring compatibility with any RL library that supports Gymnasium.

Vectorized Environment Design: Rather than running multiple independent processes (as in traditional SubprocVecEnv approaches), GPU-parallelized environments run all instances in a single process on the GPU. This eliminates inter-process communication overhead and enables order-of-magnitude speedups. The key insight is that physics simulation operations (collision detection, constraint solving, integration) are naturally parallelizable across independent environment instances.

GPU Physics Acceleration: Modern physics engines like NVIDIA PhysX support GPU-accelerated simulation where the entire physics pipeline (broadphase, narrowphase, solver) runs on the GPU. This is distinct from CPU simulation and requires that all environment state (poses, velocities, forces) reside in GPU memory as PyTorch tensors.

Observation-Action Space Contracts: The Gymnasium spaces API (Box, Dict, Discrete) formally describes the shape, dtype, and bounds of observations and actions. This contract enables RL algorithms to automatically infer network input/output dimensions and apply appropriate normalization.

Common Configuration Patterns for RL Training
Pattern num_envs obs_mode sim_backend Use Case
Fast MLP Training 512-4096 state physx_cuda Standard RL benchmarking
Visual Policy Training 64-256 rgbd physx_cuda Vision-based manipulation
Single-Env Debugging 1 state physx_cpu Development and debugging
Evaluation 8-16 state physx_cuda Policy evaluation with recording

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment