Implementation:Isaac sim IsaacGymEnvs PBT Slurm Backend
| Knowledge Sources | |
|---|---|
| Domains | Cluster_Computing, Job_Scheduling |
| Last Updated | 2026-02-15 11:00 GMT |
Overview
PBT_Slurm_Backend provides the SLURM cluster backend for the PBT experiment launcher, generating sbatch scripts and submitting them to the SLURM job scheduler.
Description
This module implements SLURM-based distributed experiment execution through two main functions and a helper. The add_slurm_args() function extends the argument parser with SLURM-specific options: --slurm_gpus_per_job (GPUs per job), --slurm_cpus_per_gpu (CPU cores per GPU), --slurm_print_only (dry-run mode), --slurm_workdir (working directory for logs and scripts), --slurm_partition (SLURM partition name), --slurm_sbatch_template (path to a custom sbatch template), and --slurm_timeout (job time limit).
The run_slurm() function handles the full SLURM submission workflow. It first creates the working directory, then loads the sbatch template (using a default template that activates a conda environment if no custom template is provided). The default template defined by SBATCH_TEMPLATE_DEFAULT includes basic conda activation and project directory setup. For each experiment generated by the RunDescription, it creates an sbatch script file by substituting template variables ($CMD, $FILENAME, $PARTITION, $GPU, $CPU, $TIMEOUT) using Python's string.Template.
After writing all sbatch scripts, the function submits them sequentially using sbatch with the configured GPU count, CPU count, partition, and output file path. Job IDs are collected from sbatch's parsable output. Upon completion, the function prints monitoring instructions (tail -f command for log files) and writes a scancel.sh script to the working directory for convenient job cancellation. The str2bool() helper function handles boolean argument parsing for the print-only flag.
Usage
Use this backend when deploying PBT experiments to a SLURM-managed computing cluster. Prepare an sbatch template with appropriate $CMD, $GPU, $CPU, and $TIMEOUT variables, then invoke the launcher with --backend slurm --slurm_workdir ./slurm_logs. Use --slurm_print_only true to preview the generated submission commands.
Code Reference
Source Location
- Repository: IsaacGymEnvs
- File: isaacgymenvs/pbt/launcher/run_slurm.py
- Lines: 1-151
Signature
SBATCH_TEMPLATE_DEFAULT = (
"#!/bin/bash\n"
"conda activate conda_env_name\n"
"cd ~/project\n"
)
def str2bool(v):
def add_slurm_args(parser):
def run_slurm(run_description, args):
Import
from isaacgymenvs.pbt.launcher.run_slurm import run_slurm, add_slurm_args
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| run_description | RunDescription | Yes | The experiment run description containing all experiments and parameter combinations |
| args | argparse.Namespace | Yes | Parsed arguments including all SLURM-specific options |
| --slurm_gpus_per_job | int | No | Number of GPUs per SLURM job (default: 1) |
| --slurm_cpus_per_gpu | int | No | CPU cores allocated per GPU (default: 16) |
| --slurm_workdir | str | Yes | Working directory for sbatch scripts and log files |
| --slurm_partition | str | No | SLURM partition to submit to (default: None) |
| --slurm_sbatch_template | str | No | Path to custom sbatch template file (default: uses built-in template) |
| --slurm_timeout | str | No | Job timeout value; 0 means no timeout (default: "0") |
| --slurm_print_only | bool | No | If True, only print sbatch commands without executing (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| return code | int | 0 on successful submission of all jobs |
| sbatch scripts | files | Generated sbatch shell scripts in the working directory, one per experiment |
| scancel.sh | file | Script to cancel all submitted jobs, written to the working directory |
| job_ids | list[str] | SLURM job IDs printed to stdout and used in the cancellation script |
Usage Examples
# Submit PBT experiments to SLURM cluster:
# python -m isaacgymenvs.pbt.launcher.run \
# --run isaacgymenvs.pbt.experiments.my_experiment \
# --backend slurm \
# --slurm_workdir ./slurm_logs \
# --slurm_gpus_per_job 1 \
# --slurm_cpus_per_gpu 8 \
# --slurm_partition gpu \
# --slurm_timeout 24:00:00
# Using a custom sbatch template:
# python -m isaacgymenvs.pbt.launcher.run \
# --run isaacgymenvs.pbt.experiments.my_experiment \
# --backend slurm \
# --slurm_workdir ./slurm_logs \
# --slurm_sbatch_template my_sbatch_template.sh
# Preview without submitting:
# python -m isaacgymenvs.pbt.launcher.run \
# --run isaacgymenvs.pbt.experiments.my_experiment \
# --backend slurm \
# --slurm_workdir ./slurm_logs \
# --slurm_print_only true
# Custom sbatch template (my_sbatch_template.sh):
# #!/bin/bash
# #SBATCH --job-name=$CMD
# #SBATCH --gres=gpu:$GPU
# #SBATCH --cpus-per-task=$CPU
# #SBATCH --time=$TIMEOUT
# source activate myenv
# $CMD
# Programmatic usage:
from isaacgymenvs.pbt.launcher.run_slurm import run_slurm
exit_code = run_slurm(run_description, args)
Related Pages
- Isaac_sim_IsaacGymEnvs_PBT_Launcher - The CLI entry point that dispatches to this backend
- Isaac_sim_IsaacGymEnvs_RunDescription - Defines the experiment configurations submitted to SLURM
- Isaac_sim_IsaacGymEnvs_PBT_Process_Backend - Alternative backend for local OS process execution
- Isaac_sim_IsaacGymEnvs_PBT_NGC_Backend - Alternative backend for NGC cloud execution