Heuristic:Haosulab ManiSkill GPU Memory Buffer Tuning
| Knowledge Sources | |
|---|---|
| Domains | GPU_Simulation, Debugging |
| Last Updated | 2026-02-15 08:00 GMT |
Overview
GPU memory buffer tuning guide for resolving PhysX buffer overflow errors (contact, patch, collision stack) when running large-scale parallel GPU simulation.
Description
When running GPU-parallelized environments with SAPIEN PhysX, the physics engine pre-allocates fixed-size GPU memory buffers for contacts, patches, and collision stacks. With many parallel environments or complex scenes, these buffers can overflow, causing runtime errors that crash training. The `GPUMemoryConfig` dataclass exposes all tunable buffer sizes, and each overflow error maps directly to a specific parameter that must be increased.
Usage
Use this heuristic when you encounter buffer overflow runtime errors during GPU simulation, particularly:
- `PxgPinnedHostLinearMemoryAllocator: overflowing initial allocation size`
- `Contact buffer overflow detected`
- `Patch buffer overflow detected`
- `Collision stack overflow detected`
These errors are most common when scaling up `num_envs`, using complex meshes with many collision contacts, or running dexterous manipulation tasks.
The Insight (Rule of Thumb)
- Action: Override `GPUMemoryConfig` parameters in your environment's `sim_config` to increase the overflowing buffer.
- Value: Double the default value for the specific buffer causing the overflow. Default values are:
- `temp_buffer_capacity` = 2^24 (16MB)
- `max_rigid_contact_count` = 2^19 (524,288)
- `max_rigid_patch_count` = 2^18 (262,144)
- `collision_stack_size` = 64 * 64 * 1024 (4MB)
- `heap_capacity` = 2^26 (64MB)
- `found_lost_pairs_capacity` = 2^25 (33M)
- Trade-off: Larger buffers consume more GPU VRAM. Over-allocating reduces the memory available for environments and observations.
Reasoning
PhysX GPU simulation pre-allocates fixed-size buffers at scene creation. Unlike CPU simulation which can dynamically resize, GPU buffers have hard limits set before simulation begins. The error messages from PhysX are explicit about which buffer overflowed, making diagnosis straightforward. The defaults in ManiSkill are chosen to work for most tasks at moderate scale (< 256 envs), but tasks with dense contacts (grasping, dexterous manipulation) or very high num_envs need larger allocations.
# Example: Override GPU memory for a high-contact task
from mani_skill.utils.structs.types import GPUMemoryConfig
gpu_config = GPUMemoryConfig(
max_rigid_contact_count=2**21, # 4x default
max_rigid_patch_count=2**20, # 4x default
collision_stack_size=128*64*1024 # 2x default
)
env = gym.make("PickCube-v1", num_envs=1024,
sim_config=dict(gpu_memory_config=gpu_config.dict()))
Code evidence from `mani_skill/utils/structs/types.py:12-31`:
@dataclass
class GPUMemoryConfig:
temp_buffer_capacity: int = 2**24
"""Increase this if you get 'PxgPinnedHostLinearMemoryAllocator: overflowing...'"""
max_rigid_contact_count: int = 2**19
"""Increase this if you get 'Contact buffer overflow detected'"""
max_rigid_patch_count: int = 2**18
"""Increase this if you get 'Patch buffer overflow detected'"""
heap_capacity: int = 2**26
found_lost_pairs_capacity: int = 2**25
found_lost_aggregate_pairs_capacity: int = 2**10
total_aggregate_pairs_capacity: int = 2**10
collision_stack_size: int = 64 * 64 * 1024
"""Increase this if you get 'Collision stack overflow detected'"""