Heuristic:Haosulab ManiSkill Rendering Memory Optimization
| Knowledge Sources | |
|---|---|
| Domains | Optimization, GPU_Simulation |
| Last Updated | 2026-02-15 08:00 GMT |
Overview
Memory optimization strategies for visual observation rendering, including shader selection, resolution sizing, and buffer management to prevent GPU OOM errors.
Description
Visual observations (RGB, depth, RGBD, pointcloud) consume significant GPU memory in ManiSkill. A single 128x128x3 RGB buffer for 100,000 replay transitions uses approximately 4.7GB, and 32 parallel environments with rendering use approximately 2.2GB. Understanding shader tiers, resolution trade-offs, and recording optimization (gzip vs. PNG) is essential for managing GPU memory during training with visual inputs.
Usage
Use this heuristic when training with visual observations (obs_mode is `rgb`, `rgbd`, `depth`, `sensor_data`, or `pointcloud`) and encountering CUDA OOM errors, or when optimizing GPU memory usage for large-scale visual RL training.
The Insight (Rule of Thumb)
- Shader selection for speed/memory:
- `"minimal"` = fastest rendering, lowest GPU memory (state-based RL)
- `"default"` = medium quality, good balance for visual RL
- `"rt-fast"` = lower-quality ray-tracing, photorealistic but expensive
- `"rt"` = full ray-tracing, highest quality but slowest
- Action: Use `shader_dir="minimal"` for state-based RL, `"default"` for visual RL.
- Resolution sizing:
- Action: Start with 64x64 or 128x128 for training; use higher resolutions only for evaluation.
- Value: Memory scales linearly with pixel count. 128x128 = 4x memory of 64x64.
- Trajectory recording format:
- Action: Use gzip compression for image sequences instead of individual PNGs.
- Value: gzip is more efficient for sequential frames.
- Trade-off: Slightly slower decompression but much smaller file sizes.
- Depth image format:
- Action: Use default uint16 depth format (not float32) for 2x memory savings.
- Value: ManiSkill defaults to uint16 for numpy depth images.
- Garbage collection:
- Action: Call `gc.collect()` after major operations (reconfiguration, environment reset) to release GPU memory.
Reasoning
Memory estimates from `examples/baselines/sac/sac_rgbd.py:203-204`:
# 128x128x3 RGB data with replay buffer size 100,000 = ~4.7GB GPU memory
# 32 parallel envs with rendering = ~2.2GB GPU memory
Recording optimization from `mani_skill/utils/wrappers/record.py:586`:
# NOTE(jigu): It is more efficient to use gzip than png for a sequence of images.
Depth format optimization from `mani_skill/utils/wrappers/record.py:595`:
# NOTE (stao): By default now cameras in ManiSkill return depth values of type uint16
# for numpy
GPU memory release from `mani_skill/envs/sapien_env.py:1243`:
gc.collect() # force gc to collect which releases most GPU memory
Drawing task rendering warning from `mani_skill/envs/tasks/drawing/draw.py:38`:
# NOTE that on GPU simulation it is not recommended to have a very high value for this
# as it can slow down rendering
GPU synchronization requirement from `mani_skill/envs/sapien_env.py:623-624`:
if self.backend.render_device.is_cuda():
torch.cuda.synchronize()