Principle:FMInference FlexLLMGen Autotuning Configuration Search

Field	Value
Sources	Paper: FlexGen, DeepSpeed Autotuning Documentation
Domains	Autotuning, Configuration_Management
Last Updated	2026-02-09 00:00 GMT

Overview

A systematic approach to exploring DeepSpeed's configuration space by enumerating all valid combinations of tunable parameters, generating named experiments, and selecting the throughput-optimal configuration.

Description

Autotuning configuration search addresses the combinatorial explosion of DeepSpeed training parameters. Key tunable dimensions include micro-batch size, gradient accumulation steps, ZeRO stage, offloading devices, and optimizer settings. The search strategy combines several techniques:

Cartesian product enumeration -- A tuning space is defined as a dictionary where tunable parameters have lists of candidate values. The search generates all combinations (Cartesian product) of these values, producing a set of complete DeepSpeed configurations to evaluate.
Template-based configuration generation -- Base configurations use variable placeholders ($VAR syntax) that are resolved against the generated parameter combinations. This separates the model-specific configuration structure from the tunable values.
Deduplication and pruning -- After generation, configurations are serialized to JSON, deduplicated (using set operations on sorted JSON strings), and pruned to remove sections irrelevant to the current search.
Canonical naming -- Each configuration receives a human-readable name derived from acronyms of its tuning keys and their values (e.g., tmbspg8_gas4 for micro-batch-size 8 and gradient-accumulation-steps 4). This makes experiment results easy to identify.
Validation -- Generated configurations are checked for internal consistency. For example, ZeRO stage 2/3 offloading requires a DeepSpeed-native optimizer (not a PyTorch optimizer) due to HuggingFace integration constraints.
Hostfile parsing -- The search reads MPI-style hostfiles to determine available resources, enabling the scheduler to distribute experiments across the hardware pool.

Usage

Use configuration search as the first phase of DeepSpeed autotuning. Define a tuning space with candidate values for each parameter, then generate all configurations. The resulting experiment set is passed to the experiment scheduler for evaluation.

The search is most effective when:

The tuning space is bounded to a manageable number of dimensions (typically 3-5 tunable parameters).
Each dimension has a small set of candidate values informed by hardware characteristics (e.g., batch sizes that are powers of 2, ZeRO stages relevant to the model size).

Theoretical Basis

The approach is exhaustive grid search over a discretized parameter space. While this scales exponentially with the number of tunable dimensions, DeepSpeed autotuning mitigates this by:

Limiting the number of tunable parameters per search phase.
Using coarse-grained value sets (e.g., batch sizes [1, 2, 4, 8, 16] rather than all integers).
Running searches in stages (e.g., first find optimal ZeRO stage, then tune batch size within that stage).

The canonical naming scheme ensures that results are deterministically associated with their parameter settings, enabling incremental search: previously evaluated configurations are skipped on re-runs.

Related Pages

Implementation:FMInference_FlexLLMGen_DeepSpeed_Autotuning_Utils

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment