Principle:Haosulab ManiSkill Parallel Trajectory Generation

Field	Value
Principle Name	Parallel Trajectory Generation
Domain	Motion_Planning
Overview	Parallelizing trajectory generation across CPU processes
Date	2026-02-15
Repository	Haosulab/ManiSkill

Overview

The Parallel Trajectory Generation principle describes how ManiSkill accelerates the production of demonstration trajectories by distributing the workload across multiple CPU processes. Motion planning solvers are computationally expensive, and generating hundreds or thousands of trajectories sequentially can be prohibitively slow. By parallelizing the generation across independent processes, the total wall-clock time is reduced roughly proportionally to the number of processes.

Description

The parallelization strategy follows a simple data-parallel design:

Work partitioning: The total number of requested trajectories (num_traj) is evenly divided among num_procs processes. Each process is assigned a unique process ID and a non-overlapping seed range to ensure reproducibility and diversity.

Independent execution: Each process runs its own instance of the environment (using the CPU simulation backend), its own RecordEpisode wrapper, and its own motion planning solver. There is no inter-process communication during generation.

Per-process output files: Each process writes to a separate HDF5 file, named with a process-ID suffix (e.g., demo.0.h5, demo.1.h5). This avoids file locking conflicts and allows each process to operate completely independently.

Post-generation merge: After all processes complete, the per-process HDF5 files are merged into a single consolidated file using the trajectory merging utility (see Principle:Haosulab_ManiSkill_Trajectory_Merging). The temporary per-process files are then deleted.

Process spawning: Python's multiprocessing module with the spawn start method is used (rather than fork) to avoid issues with CUDA context inheritance and shared file descriptors.

Usage

Parallel generation is enabled by passing --num-procs greater than 1:

python -m mani_skill.examples.motionplanning.panda.run \
    -e PickCube-v1 -n 100 --num-procs 8 --record-dir demos

This spawns 8 processes, each generating approximately 12-13 trajectories (100 / 8), and merges the results into a single output file.

Constraints:

num_traj must be greater than or equal to num_procs.
Currently only works with the CPU simulation backend (-b cpu or -b auto when no GPU is available).

Theoretical Basis

Embarrassingly parallel workloads: Trajectory generation is an embarrassingly parallel problem because each trajectory is independent -- it depends only on its initial seed and the solver logic, not on the results of other trajectories. This makes it an ideal candidate for data parallelism.

Process-level isolation: Using separate processes (rather than threads) avoids Python's Global Interpreter Lock (GIL) and provides full memory isolation, which is important because the SAPIEN physics simulator and mplib planner maintain internal state that is not thread-safe.

Seed management: By assigning non-overlapping seed ranges to each process, the system guarantees that no two processes generate the same environment configuration, ensuring dataset diversity.

Amdahl's Law: The speedup is limited by the serial portions of the pipeline (process startup, merge step), but because trajectory generation dominates the runtime, near-linear speedup is achievable for moderate numbers of processes.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment