Implementation:FMInference FlexLLMGen DeepSpeed Autotuning Utils
| Field | Value |
|---|---|
| Sources | Repo: FlexLLMGen |
| Domains | Autotuning, Configuration_Management |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Vendored DeepSpeed utility module providing helper functions for configuration manipulation, experiment generation, hostfile parsing, and result formatting used by the autotuning system.
Description
utils.py contains a collection of utility functions that support the DeepSpeed autotuning pipeline. These functions handle configuration dictionary manipulation (merging, replacing, pruning, deduplication), combinatorial experiment generation from tuning spaces, canonical naming of experiments, hostfile parsing, and human-readable formatting of memory and number values.
Key function groups:
- Error detection -- search_error scans stderr logs for error strings; was_interruptted checks for KeyboardInterrupt.
- Template substitution -- find_replace and find_replace_str perform variable substitution ($VAR syntax) in configuration dictionaries.
- Dictionary operations -- combine_dict merges dictionaries with list aggregation; replace_dict overwrites values; del_if_exists removes keys recursively; get_val_by_key and set_val_by_key perform recursive key lookup/update.
- Experiment generation -- get_all_configs produces the Cartesian product of all tunable parameter values; get_tuning_keys identifies which parameters have multiple values; canonical_name generates human-readable experiment names from acronyms.
- Configuration validation -- validate_ds_config checks that ZeRO configurations are valid (e.g., offloading requires a DeepSpeed optimizer).
- Formatting -- memory_to_string and number_to_string convert numeric values to human-readable strings with appropriate unit suffixes.
This is AUTO_KEEP vendored code from DeepSpeed.
Code Reference
| Field | Value |
|---|---|
| Repository | FlexLLMGen |
| File | benchmark/third_party/DeepSpeed/deepspeed/autotuning/utils.py |
| Lines | 1-456 |
Key Functions:
def search_error(filename): ...
def was_interruptted(filename): ...
def find_replace(target, replace_dict): ...
def combine_dict(d, u): ...
def replace_dict(d, u, ignored_keys=[]): ...
def get_val_by_key(d: dict, k): ...
def fetch_hostfile(hostfile_path): ...
def validate_ds_config(config: dict): ...
def get_all_configs(tuning_space: dict, ignore_keys=None): ...
def canonical_name(config: dict, tuning_keys=None, prefix="", omit_val=False): ...
def write_experiments(exps: list, exps_dir: str): ...
def memory_to_string(n, postfix="", units=None, precision=2): ...
def number_to_string(n, postfix="", units=None, precision=2): ...
I/O Contract
Key Functions
| Function | Inputs | Output | Description |
|---|---|---|---|
| search_error | filename (str) | str or None | Returns error message from stderr log, or None if clean |
| get_all_configs | tuning_space (dict), ignore_keys (list) | list of dicts | All combinations of tunable parameter values |
| canonical_name | config (dict), tuning_keys (list), prefix (str) | str | Human-readable experiment name from config acronyms |
| validate_ds_config | config (dict) | bool | True if the DeepSpeed config is valid |
| fetch_hostfile | hostfile_path (str) | OrderedDict or None | Hostname-to-slot-count mapping from MPI-style hostfile |
| write_experiments | exps (list), exps_dir (str) | list of str | File paths of written experiment JSON files |