Principle:Haosulab ManiSkill Parallel Trajectory Generation
| Field | Value |
|---|---|
| Principle Name | Parallel Trajectory Generation |
| Domain | Motion_Planning |
| Overview | Parallelizing trajectory generation across CPU processes |
| Date | 2026-02-15 |
| Repository | Haosulab/ManiSkill |
Overview
The Parallel Trajectory Generation principle describes how ManiSkill accelerates the production of demonstration trajectories by distributing the workload across multiple CPU processes. Motion planning solvers are computationally expensive, and generating hundreds or thousands of trajectories sequentially can be prohibitively slow. By parallelizing the generation across independent processes, the total wall-clock time is reduced roughly proportionally to the number of processes.
Description
The parallelization strategy follows a simple data-parallel design:
- Work partitioning: The total number of requested trajectories (
num_traj) is evenly divided amongnum_procsprocesses. Each process is assigned a unique process ID and a non-overlapping seed range to ensure reproducibility and diversity.
- Independent execution: Each process runs its own instance of the environment (using the CPU simulation backend), its own
RecordEpisodewrapper, and its own motion planning solver. There is no inter-process communication during generation.
- Per-process output files: Each process writes to a separate HDF5 file, named with a process-ID suffix (e.g.,
demo.0.h5,demo.1.h5). This avoids file locking conflicts and allows each process to operate completely independently.
- Post-generation merge: After all processes complete, the per-process HDF5 files are merged into a single consolidated file using the trajectory merging utility (see Principle:Haosulab_ManiSkill_Trajectory_Merging). The temporary per-process files are then deleted.
- Process spawning: Python's
multiprocessingmodule with thespawnstart method is used (rather thanfork) to avoid issues with CUDA context inheritance and shared file descriptors.
Usage
Parallel generation is enabled by passing --num-procs greater than 1:
python -m mani_skill.examples.motionplanning.panda.run \
-e PickCube-v1 -n 100 --num-procs 8 --record-dir demos
This spawns 8 processes, each generating approximately 12-13 trajectories (100 / 8), and merges the results into a single output file.
Constraints:
num_trajmust be greater than or equal tonum_procs.- Currently only works with the CPU simulation backend (
-b cpuor-b autowhen no GPU is available).
Theoretical Basis
- Embarrassingly parallel workloads: Trajectory generation is an embarrassingly parallel problem because each trajectory is independent -- it depends only on its initial seed and the solver logic, not on the results of other trajectories. This makes it an ideal candidate for data parallelism.
- Process-level isolation: Using separate processes (rather than threads) avoids Python's Global Interpreter Lock (GIL) and provides full memory isolation, which is important because the SAPIEN physics simulator and mplib planner maintain internal state that is not thread-safe.
- Seed management: By assigning non-overlapping seed ranges to each process, the system guarantees that no two processes generate the same environment configuration, ensuring dataset diversity.
- Amdahl's Law: The speedup is limited by the serial portions of the pipeline (process startup, merge step), but because trajectory generation dominates the runtime, near-linear speedup is achievable for moderate numbers of processes.