Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Isaac sim IsaacGymEnvs PBT Slurm Backend

From Leeroopedia
Knowledge Sources
Domains Cluster_Computing, Job_Scheduling
Last Updated 2026-02-15 11:00 GMT

Overview

PBT_Slurm_Backend provides the SLURM cluster backend for the PBT experiment launcher, generating sbatch scripts and submitting them to the SLURM job scheduler.

Description

This module implements SLURM-based distributed experiment execution through two main functions and a helper. The add_slurm_args() function extends the argument parser with SLURM-specific options: --slurm_gpus_per_job (GPUs per job), --slurm_cpus_per_gpu (CPU cores per GPU), --slurm_print_only (dry-run mode), --slurm_workdir (working directory for logs and scripts), --slurm_partition (SLURM partition name), --slurm_sbatch_template (path to a custom sbatch template), and --slurm_timeout (job time limit).

The run_slurm() function handles the full SLURM submission workflow. It first creates the working directory, then loads the sbatch template (using a default template that activates a conda environment if no custom template is provided). The default template defined by SBATCH_TEMPLATE_DEFAULT includes basic conda activation and project directory setup. For each experiment generated by the RunDescription, it creates an sbatch script file by substituting template variables ($CMD, $FILENAME, $PARTITION, $GPU, $CPU, $TIMEOUT) using Python's string.Template.

After writing all sbatch scripts, the function submits them sequentially using sbatch with the configured GPU count, CPU count, partition, and output file path. Job IDs are collected from sbatch's parsable output. Upon completion, the function prints monitoring instructions (tail -f command for log files) and writes a scancel.sh script to the working directory for convenient job cancellation. The str2bool() helper function handles boolean argument parsing for the print-only flag.

Usage

Use this backend when deploying PBT experiments to a SLURM-managed computing cluster. Prepare an sbatch template with appropriate $CMD, $GPU, $CPU, and $TIMEOUT variables, then invoke the launcher with --backend slurm --slurm_workdir ./slurm_logs. Use --slurm_print_only true to preview the generated submission commands.

Code Reference

Source Location

Signature

SBATCH_TEMPLATE_DEFAULT = (
    "#!/bin/bash\n"
    "conda activate conda_env_name\n"
    "cd ~/project\n"
)

def str2bool(v):
def add_slurm_args(parser):
def run_slurm(run_description, args):

Import

from isaacgymenvs.pbt.launcher.run_slurm import run_slurm, add_slurm_args

I/O Contract

Inputs

Name Type Required Description
run_description RunDescription Yes The experiment run description containing all experiments and parameter combinations
args argparse.Namespace Yes Parsed arguments including all SLURM-specific options
--slurm_gpus_per_job int No Number of GPUs per SLURM job (default: 1)
--slurm_cpus_per_gpu int No CPU cores allocated per GPU (default: 16)
--slurm_workdir str Yes Working directory for sbatch scripts and log files
--slurm_partition str No SLURM partition to submit to (default: None)
--slurm_sbatch_template str No Path to custom sbatch template file (default: uses built-in template)
--slurm_timeout str No Job timeout value; 0 means no timeout (default: "0")
--slurm_print_only bool No If True, only print sbatch commands without executing (default: False)

Outputs

Name Type Description
return code int 0 on successful submission of all jobs
sbatch scripts files Generated sbatch shell scripts in the working directory, one per experiment
scancel.sh file Script to cancel all submitted jobs, written to the working directory
job_ids list[str] SLURM job IDs printed to stdout and used in the cancellation script

Usage Examples

# Submit PBT experiments to SLURM cluster:
# python -m isaacgymenvs.pbt.launcher.run \
#     --run isaacgymenvs.pbt.experiments.my_experiment \
#     --backend slurm \
#     --slurm_workdir ./slurm_logs \
#     --slurm_gpus_per_job 1 \
#     --slurm_cpus_per_gpu 8 \
#     --slurm_partition gpu \
#     --slurm_timeout 24:00:00

# Using a custom sbatch template:
# python -m isaacgymenvs.pbt.launcher.run \
#     --run isaacgymenvs.pbt.experiments.my_experiment \
#     --backend slurm \
#     --slurm_workdir ./slurm_logs \
#     --slurm_sbatch_template my_sbatch_template.sh

# Preview without submitting:
# python -m isaacgymenvs.pbt.launcher.run \
#     --run isaacgymenvs.pbt.experiments.my_experiment \
#     --backend slurm \
#     --slurm_workdir ./slurm_logs \
#     --slurm_print_only true

# Custom sbatch template (my_sbatch_template.sh):
# #!/bin/bash
# #SBATCH --job-name=$CMD
# #SBATCH --gres=gpu:$GPU
# #SBATCH --cpus-per-task=$CPU
# #SBATCH --time=$TIMEOUT
# source activate myenv
# $CMD

# Programmatic usage:
from isaacgymenvs.pbt.launcher.run_slurm import run_slurm
exit_code = run_slurm(run_description, args)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment