Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FMInference FlexLLMGen DeepSpeed Autotuning Scheduler

From Leeroopedia
Revision as of 14:55, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/FMInference_FlexLLMGen_DeepSpeed_Autotuning_Scheduler.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Field Value
Sources Repo: FlexLLMGen
Domains Autotuning, Distributed_Training, Resource_Management
Last Updated 2026-02-09 00:00 GMT

Overview

Vendored DeepSpeed experiment scheduler that manages the lifecycle of autotuning experiments, including resource allocation, multi-threaded job execution, and result parsing.

Description

scheduler.py provides the ResourceManager class, which orchestrates the execution of DeepSpeed autotuning experiments across a pool of GPU nodes. It operates as a multi-threaded scheduler where a main loop dispatches experiments from a queue onto available GPU resources, and each experiment runs in its own thread.

The module also defines the supporting classes Node and Reservation for GPU slot management, and standalone functions for experiment execution (run_experiment) and cleanup (clean_up).

Key behaviors:

  • Experiment queueing -- Reads experiment configuration files (hjson format), assigns experiment IDs, and enqueues them. Already-completed experiments are skipped unless they were interrupted.
  • Resource allocation -- The resource_request method attempts to reserve GPU slots across nodes. If resources are insufficient, the experiment is placed back at the front of the queue.
  • Threaded execution -- Each experiment runs in a separate thread via run_job, which builds a DeepSpeed launch command and invokes it as a subprocess.
  • Result parsing -- After all experiments complete, parse_results reads metric files to identify the configuration with the highest throughput.
  • Cleanup -- Uses pdsh to kill experiment processes across distributed nodes after completion or failure.

This is AUTO_KEEP vendored code from DeepSpeed, included in the FlexLLMGen benchmark infrastructure.

Code Reference

Field Value
Repository FlexLLMGen
File benchmark/third_party/DeepSpeed/deepspeed/autotuning/scheduler.py
Lines 1-444

Key Classes and Functions:

class ResourceManager:
    def __init__(self, args, hosts, num_gpus_per_node, results_dir, exps_dir, arg_mappings):
        ...

    def schedule_experiments(self, exp_paths):
        ...

    def run_job(self, exp: dict, reservations):
        ...

    def run(self):
        ...

    def parse_results(self, metric):
        ...

class Node:
    def __init__(self, host, max_slots):
        ...

class Reservation:
    def __init__(self, node, slots):
        ...

def run_experiment(exp: dict, reservations, user_script, user_args):
    ...

def clean_up(exp: dict, reservations):
    ...

I/O Contract

Inputs

Parameter Type Required Description
args Namespace Yes Command-line arguments including master_port, user_script, user_args
hosts list Yes List of hostnames in the resource pool
num_gpus_per_node int Yes Number of GPU slots per node
results_dir str Yes Directory to store experiment results
exps_dir str Yes Directory containing experiment configuration files
arg_mappings dict No Mapping of experiment config keys to user argument flags

Outputs

Output Type Description
finished_experiments dict Mapping of experiment IDs to (experiment_config, error) tuples
parse_results return tuple (best_experiment_config, max_throughput) from all finished experiments

Internal Workflow

  1. schedule_experiments loads hjson experiment configs from file paths, skipping duplicates and completed runs.
  2. run iterates the experiment queue, calling resource_request to allocate GPU slots across nodes.
  3. When resources are available, run_job spawns a thread calling run_experiment.
  4. run_experiment serializes the DeepSpeed config (base64-encoded), constructs a deepspeed CLI command, and runs it via subprocess.
  5. experiment_check periodically joins threads, collects results, and restores GPU slots.
  6. parse_results reads metric files from finished experiments to select the optimal configuration.
  7. clean_up uses pdsh to kill processes on remote nodes after experiment completion.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment