Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Isaac sim IsaacGymEnvs PBT Process Backend

From Leeroopedia
Knowledge Sources
Domains Multi_GPU_Computing, Process_Management
Last Updated 2026-02-15 11:00 GMT

Overview

PBT_Process_Backend provides the local OS process backend for the PBT experiment launcher, managing multi-GPU process allocation and parallel experiment execution on a single machine.

Description

This module implements local multi-process experiment execution through two main functions. The add_os_parallelism_args() function extends the argument parser with three arguments: --num_gpus (number of local GPUs to utilize), --max_parallel (maximum number of simultaneous experiments), and --experiments_per_gpu (how many experiments can share a single GPU, with -1 meaning no GPU pinning).

The run() function manages the full lifecycle of parallel experiment execution. It generates all experiment configurations from the RunDescription and launches them as subprocesses, respecting the configured parallelism constraints. When experiments_per_gpu is positive, the function implements GPU-aware scheduling: it maintains a per-GPU process count, finds the least busy GPU for each new experiment, and sets CUDA_VISIBLE_DEVICES accordingly. The Python executable is automatically resolved to the current virtual environment's interpreter.

The main loop continuously polls running processes, launches new ones when capacity is available, and tracks failures. It enforces both the global max_parallel limit and the per-GPU experiment limit via can_squeeze_another_process() and find_least_busy_gpu() helper functions. Completed processes are removed from tracking, and their return codes are checked for failures. A periodic logging mechanism (every 3 seconds) reports any failed processes. The function supports custom environment variables per experiment via the Experiment.env_vars attribute.

Usage

Use this backend when running PBT experiments locally on a multi-GPU workstation. It is the default backend for the PBT launcher. Configure --num_gpus to match your hardware, --max_parallel to control resource usage, and --experiments_per_gpu to enable GPU sharing when experiments are small enough.

Code Reference

Source Location

Signature

def add_os_parallelism_args(parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
def ensure_dir_exists(path) -> str:
def run(run_description, args):

Import

from isaacgymenvs.pbt.launcher.run_processes import run, add_os_parallelism_args

I/O Contract

Inputs

Name Type Required Description
run_description RunDescription Yes The experiment run description containing all experiments and parameter combinations
args argparse.Namespace Yes Parsed arguments including num_gpus, max_parallel, experiments_per_gpu, train_dir, and pause_between
--num_gpus int No Number of local GPUs to use (default: 1)
--max_parallel int No Maximum number of simultaneous experiment processes (default: 4)
--experiments_per_gpu int No Number of experiments per GPU; -1 disables GPU pinning (default: -1)

Outputs

Name Type Description
return code int 0 on successful completion of all experiments
stdout str Process launch and completion logs, including any failure warnings with PIDs and return codes

Usage Examples

# Run PBT experiments on 4 GPUs with up to 8 parallel processes:
# python -m isaacgymenvs.pbt.launcher.run \
#     --run isaacgymenvs.pbt.experiments.my_experiment \
#     --backend processes \
#     --num_gpus 4 \
#     --max_parallel 8 \
#     --experiments_per_gpu 2 \
#     --pause_between 5

# Run without GPU pinning (all experiments see all GPUs):
# python -m isaacgymenvs.pbt.launcher.run \
#     --run isaacgymenvs.pbt.experiments.my_experiment \
#     --backend processes \
#     --num_gpus 1 \
#     --max_parallel 4

# Programmatic usage:
from isaacgymenvs.pbt.launcher.run_processes import run, add_os_parallelism_args

# The run function manages process lifecycle
exit_code = run(run_description, args)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment