Implementation:FMInference FlexLLMGen DeepSpeed Runner
| Field | Value |
|---|---|
| Sources | Repo: FlexLLMGen |
| Domains | Distributed_Training, Job_Orchestration |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Vendored DeepSpeed runner that serves as the main front-end for launching multi-node distributed training jobs, handling hostfile parsing, resource filtering, multi-node backend selection (PDSH, OpenMPI, MVAPICH, Slurm), autotuning integration, and environment configuration.
Description
runner.py is the top-level entry point for the deepspeed command-line tool. It orchestrates the entire job launch process from a single command invocation, including resource discovery, validation, and delegation to the appropriate multi-node runner backend.
Key features:
- Argument parsing -- Accepts hostfile, include/exclude resource filters, num_nodes, num_gpus, master_addr/port, launcher backend selection, module/no_python flags, autotuning mode, elastic training options, and the user script with arguments.
- Hostfile parsing -- Reads MPI-style hostfiles (hostname slots=N) into an OrderedDict. Falls back to localhost with torch.cuda.device_count() GPUs if no hostfile is found.
- Resource filtering -- Supports --include and --exclude flags with the syntax NODE[:SLOT,SLOT]@NODE[:SLOT,SLOT] to select or reject specific nodes and GPU slots. Also supports --num_nodes and --num_gpus for simpler subsetting. Respects CUDA_VISIBLE_DEVICES when running on a single node.
- SSH validation -- For multi-node jobs, validates passwordless SSH to the first host before proceeding.
- World info encoding -- Encodes the active resource map as base64 JSON (world_info) for passing to the per-node launcher.
- Single-node path -- For single-node jobs, directly invokes deepspeed.launcher.launch as a subprocess with the encoded world info.
- Multi-node backends -- For multi-node jobs, delegates to one of four runner backends:
- PDSHRunner -- Uses pdsh for parallel SSH-based launching.
- OpenMPIRunner -- Uses mpirun for OpenMPI-based launching.
- MVAPICHRunner -- Uses MVAPICH's mpirun_rsh.
- SlurmRunner -- Uses srun for Slurm-based launching.
- Environment export -- Collects environment variables matching EXPORT_ENVS prefixes (MLFLOW, NCCL, PYTHON, MV2, UCX) and exports them to remote nodes. Also reads .deepspeed_env files for additional exports.
- Autotuning integration -- When --autotuning is specified, delegates to run_autotuning which runs the Autotuner to discover optimal configurations before (or instead of) running the job.
- Signal handling -- For PDSH launcher, installs signal handlers that send SIGINT/SIGTERM to the main process and invoke a kill command on remote nodes.
This is AUTO_KEEP vendored code from DeepSpeed.
Code Reference
| Field | Value |
|---|---|
| Repository | FlexLLMGen |
| File | benchmark/third_party/DeepSpeed/deepspeed/launcher/runner.py |
| Lines | 1-533 |
Key Functions:
def parse_args(args=None): ...
def fetch_hostfile(hostfile_path): ...
def parse_resource_filter(host_info, include_str="", exclude_str=""): ...
def parse_inclusion_exclusion(resource_pool, inclusion, exclusion): ...
def encode_world_info(world_info): ...
def run_autotuning(args, active_resources): ...
def main(args=None): ...
I/O Contract
Command-Line Arguments
| Argument | Type | Default | Description |
|---|---|---|---|
| -H/--hostfile | str | /job/hostfile | MPI-style hostfile path |
| -i/--include | str | "" | Resource inclusion filter (NODE[@NODE]:SLOT,SLOT) |
| -e/--exclude | str | "" | Resource exclusion filter (mutually exclusive with include) |
| --num_nodes | int | -1 | Number of nodes to use (top N from hostfile) |
| --num_gpus | int | -1 | Max GPUs per node |
| --launcher | str | pdsh | Backend: pdsh, openmpi, mvapich, slurm |
| --autotuning | str | "" | "tune" or "run" to enable autotuning |
| --elastic_training | flag | False | Enable elastic training |
| user_script | str | required | Training script path |
| user_args | list | [] | Training script arguments |
Outputs
| Output | Type | Description |
|---|---|---|
| exit code | int | 0 on success, non-zero on failure (propagated from child process) |