Principle:Haosulab ManiSkill Environment Testing
| Field | Value |
|---|---|
| Page Type | Principle |
| Title | ManiSkill Environment Testing |
| Domain | Simulation, Robotics, Environment_Design, Quality_Assurance |
| Related Implementation | Implementation:Haosulab_ManiSkill_Demo_Random_Action_CLI |
| Date | 2026-02-15 |
| Repository | Haosulab/ManiSkill |
Overview
Description
Environment testing in ManiSkill validates that a custom task environment functions correctly before it is used for training or evaluation. Testing covers several aspects:
- Basic functionality: The environment can be created, reset, and stepped through without runtime errors. Observation spaces, action spaces, and reward values are well-formed.
- Visual verification: The scene looks correct when rendered -- objects are in expected positions, the robot is visible, and cameras are properly configured.
- GPU parallelization correctness: When running with multiple environments on the GPU, each environment behaves independently and partial resets work correctly.
- Reward signal validation: Dense rewards produce reasonable values (not NaN, not constant zero, progressing toward expected patterns).
- Termination conditions: Episodes terminate when success or failure conditions are met, and do not terminate spuriously.
ManiSkill provides a built-in CLI tool (demo_random_action.py) that serves as the primary testing mechanism. It creates the environment, takes random actions, and optionally renders the scene in a GUI window or records video. This tool catches common implementation errors such as:
- Missing or incorrect observation tensor shapes.
- Reward functions that raise exceptions or return wrong shapes.
- Scenes that are not properly initialized (objects at the origin, overlapping actors).
- GPU simulation crashes due to physics instabilities.
Usage
Environment testing should be performed after implementing all task methods (registration, scene loading, episode initialization, observations, rewards, and evaluation) and before using the environment for training. The recommended testing workflow is:
- CPU single-environment test: Run with
--num-envs 1and--render-mode humanto visually inspect the scene and verify basic functionality. - CPU reward/observation check: Run without rendering and inspect printed reward values and termination signals.
- GPU multi-environment test: Run with
--num-envs 4or more to verify GPU parallelization. Use--render-mode humanto see all environments rendered together in a grid. - Partial reset verification: Let some environments terminate naturally and confirm that they reset independently while others continue running.
- Recording test: Record a video with
--record-dirto verify that human render cameras produce usable output.
The CLI tool acts as a smoke test. For more rigorous testing (e.g., verifying specific reward values for known configurations, testing edge cases), developers should write custom test scripts that use the environment's set_state_dict() and evaluate() methods directly.
Theoretical Basis
Environment testing for reinforcement learning simulation draws on software testing principles adapted to the unique challenges of physics simulation and parallel GPU computation:
Smoke Testing Through Random Exploration
The random action approach is a form of fuzz testing applied to simulation environments. By sending random actions to the environment, the test exercises a wide range of states and transitions that a deterministic test might miss. This is particularly effective for catching:
- Physics instabilities (objects flying off, NaN poses) that only occur in certain configurations.
- Tensor shape mismatches that depend on the number of active contacts or observations.
- Edge cases in reward computation (division by zero when objects are at the same position).
Visual Debugging
Physics simulation is inherently spatial, making visual inspection an irreplaceable debugging tool. The GUI viewer allows the developer to:
- Verify that objects are placed where expected.
- Observe robot motion to confirm that control modes work.
- Identify collision issues (objects passing through each other, unstable stacking).
- Check that goal markers and visual indicators are correctly positioned.
This follows the principle of observability in testing -- making the internal state of the system visible to aid in diagnosis.
GPU Simulation Correctness
GPU-parallel simulation introduces classes of bugs not present in CPU simulation:
- Cross-environment contamination: State from one parallel environment leaking into another due to incorrect indexing.
- Partial reset errors: Resetting some environments while others continue running requires careful masking of state updates.
- Numerical divergence: GPU floating-point arithmetic may produce different results than CPU, leading to environments that work on CPU but fail on GPU.
Testing with multiple GPU environments and verifying that each behaves independently is essential for production use.
Incremental Validation
The recommended testing workflow follows the test pyramid principle -- start with the simplest, fastest tests (single CPU environment) and progressively add complexity (multi-environment GPU, visual recording). This approach catches the most common errors quickly while still covering edge cases.
Related Pages
- Implementation:Haosulab_ManiSkill_Demo_Random_Action_CLI -- The concrete CLI testing tool
- Principle:Haosulab_ManiSkill_Environment_Registration -- The environment must be registered before testing
- Principle:Haosulab_ManiSkill_Reward_Success_Design -- Testing validates reward correctness
- Principle:Haosulab_ManiSkill_Observation_Definition -- Testing validates observation structure