Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FMInference FlexLLMGen DeepSpeed Autotuning Utils

From Leeroopedia
Revision as of 14:55, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/FMInference_FlexLLMGen_DeepSpeed_Autotuning_Utils.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Field Value
Sources Repo: FlexLLMGen
Domains Autotuning, Configuration_Management
Last Updated 2026-02-09 00:00 GMT

Overview

Vendored DeepSpeed utility module providing helper functions for configuration manipulation, experiment generation, hostfile parsing, and result formatting used by the autotuning system.

Description

utils.py contains a collection of utility functions that support the DeepSpeed autotuning pipeline. These functions handle configuration dictionary manipulation (merging, replacing, pruning, deduplication), combinatorial experiment generation from tuning spaces, canonical naming of experiments, hostfile parsing, and human-readable formatting of memory and number values.

Key function groups:

  • Error detection -- search_error scans stderr logs for error strings; was_interruptted checks for KeyboardInterrupt.
  • Template substitution -- find_replace and find_replace_str perform variable substitution ($VAR syntax) in configuration dictionaries.
  • Dictionary operations -- combine_dict merges dictionaries with list aggregation; replace_dict overwrites values; del_if_exists removes keys recursively; get_val_by_key and set_val_by_key perform recursive key lookup/update.
  • Experiment generation -- get_all_configs produces the Cartesian product of all tunable parameter values; get_tuning_keys identifies which parameters have multiple values; canonical_name generates human-readable experiment names from acronyms.
  • Configuration validation -- validate_ds_config checks that ZeRO configurations are valid (e.g., offloading requires a DeepSpeed optimizer).
  • Formatting -- memory_to_string and number_to_string convert numeric values to human-readable strings with appropriate unit suffixes.

This is AUTO_KEEP vendored code from DeepSpeed.

Code Reference

Field Value
Repository FlexLLMGen
File benchmark/third_party/DeepSpeed/deepspeed/autotuning/utils.py
Lines 1-456

Key Functions:

def search_error(filename): ...
def was_interruptted(filename): ...
def find_replace(target, replace_dict): ...
def combine_dict(d, u): ...
def replace_dict(d, u, ignored_keys=[]): ...
def get_val_by_key(d: dict, k): ...
def fetch_hostfile(hostfile_path): ...
def validate_ds_config(config: dict): ...
def get_all_configs(tuning_space: dict, ignore_keys=None): ...
def canonical_name(config: dict, tuning_keys=None, prefix="", omit_val=False): ...
def write_experiments(exps: list, exps_dir: str): ...
def memory_to_string(n, postfix="", units=None, precision=2): ...
def number_to_string(n, postfix="", units=None, precision=2): ...

I/O Contract

Key Functions

Function Inputs Output Description
search_error filename (str) str or None Returns error message from stderr log, or None if clean
get_all_configs tuning_space (dict), ignore_keys (list) list of dicts All combinations of tunable parameter values
canonical_name config (dict), tuning_keys (list), prefix (str) str Human-readable experiment name from config acronyms
validate_ds_config config (dict) bool True if the DeepSpeed config is valid
fetch_hostfile hostfile_path (str) OrderedDict or None Hostname-to-slot-count mapping from MPI-style hostfile
write_experiments exps (list), exps_dir (str) list of str File paths of written experiment JSON files

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment