Implementation:FMInference FlexLLMGen DeepSpeed Autotuning Scheduler
| Field | Value |
|---|---|
| Sources | Repo: FlexLLMGen |
| Domains | Autotuning, Distributed_Training, Resource_Management |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Vendored DeepSpeed experiment scheduler that manages the lifecycle of autotuning experiments, including resource allocation, multi-threaded job execution, and result parsing.
Description
scheduler.py provides the ResourceManager class, which orchestrates the execution of DeepSpeed autotuning experiments across a pool of GPU nodes. It operates as a multi-threaded scheduler where a main loop dispatches experiments from a queue onto available GPU resources, and each experiment runs in its own thread.
The module also defines the supporting classes Node and Reservation for GPU slot management, and standalone functions for experiment execution (run_experiment) and cleanup (clean_up).
Key behaviors:
- Experiment queueing -- Reads experiment configuration files (hjson format), assigns experiment IDs, and enqueues them. Already-completed experiments are skipped unless they were interrupted.
- Resource allocation -- The resource_request method attempts to reserve GPU slots across nodes. If resources are insufficient, the experiment is placed back at the front of the queue.
- Threaded execution -- Each experiment runs in a separate thread via run_job, which builds a DeepSpeed launch command and invokes it as a subprocess.
- Result parsing -- After all experiments complete, parse_results reads metric files to identify the configuration with the highest throughput.
- Cleanup -- Uses pdsh to kill experiment processes across distributed nodes after completion or failure.
This is AUTO_KEEP vendored code from DeepSpeed, included in the FlexLLMGen benchmark infrastructure.
Code Reference
| Field | Value |
|---|---|
| Repository | FlexLLMGen |
| File | benchmark/third_party/DeepSpeed/deepspeed/autotuning/scheduler.py |
| Lines | 1-444 |
Key Classes and Functions:
class ResourceManager:
def __init__(self, args, hosts, num_gpus_per_node, results_dir, exps_dir, arg_mappings):
...
def schedule_experiments(self, exp_paths):
...
def run_job(self, exp: dict, reservations):
...
def run(self):
...
def parse_results(self, metric):
...
class Node:
def __init__(self, host, max_slots):
...
class Reservation:
def __init__(self, node, slots):
...
def run_experiment(exp: dict, reservations, user_script, user_args):
...
def clean_up(exp: dict, reservations):
...
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| args | Namespace | Yes | Command-line arguments including master_port, user_script, user_args |
| hosts | list | Yes | List of hostnames in the resource pool |
| num_gpus_per_node | int | Yes | Number of GPU slots per node |
| results_dir | str | Yes | Directory to store experiment results |
| exps_dir | str | Yes | Directory containing experiment configuration files |
| arg_mappings | dict | No | Mapping of experiment config keys to user argument flags |
Outputs
| Output | Type | Description |
|---|---|---|
| finished_experiments | dict | Mapping of experiment IDs to (experiment_config, error) tuples |
| parse_results return | tuple | (best_experiment_config, max_throughput) from all finished experiments |
Internal Workflow
- schedule_experiments loads hjson experiment configs from file paths, skipping duplicates and completed runs.
- run iterates the experiment queue, calling resource_request to allocate GPU slots across nodes.
- When resources are available, run_job spawns a thread calling run_experiment.
- run_experiment serializes the DeepSpeed config (base64-encoded), constructs a deepspeed CLI command, and runs it via subprocess.
- experiment_check periodically joins threads, collects results, and restores GPU slots.
- parse_results reads metric files from finished experiments to select the optimal configuration.
- clean_up uses pdsh to kill processes on remote nodes after experiment completion.