Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FMInference FlexLLMGen DeepSpeed Runtime Utils

From Leeroopedia
Revision as of 14:56, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/FMInference_FlexLLMGen_DeepSpeed_Runtime_Utils.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Field Value
Sources Repo: FlexLLMGen, Upstream: DeepSpeed
Domains Runtime_Infrastructure, Distributed_Training
Last Updated 2026-02-09 00:00 GMT

Overview

Vendored DeepSpeed utility module providing helper functions for gradient operations, memory monitoring, tensor parallelism, random seed management, and distributed communication used throughout the runtime.

Description

The utils.py file (1018 lines) is a vendored copy of DeepSpeed's runtime utility collection, containing helper functions sourced from NVIDIA's Megatron-LM and extended for DeepSpeed's needs.

Key components include:

  • Gradient utilities:
    • clip_grad_norm_ -- Clips gradient global norm across all parameters, handling model-parallel and expert-parallel parameter groups separately.
    • get_global_norm_of_tensors -- Computes the global L2 norm across a list of tensors, with all-reduce for distributed computation.
    • clip_tensors_by_global_norm -- Scales tensors to enforce a maximum global norm.
    • get_grad_norm -- Computes gradient norm with special handling for model-parallel parameters (avoiding double-counting across ranks).
  • Memory monitoring:
    • see_memory_usage -- Logs GPU memory allocated, cached, and max allocated, plus CPU RAM and virtual memory usage via psutil. Optionally forces garbage collection and CUDA cache clearing.
  • Parallelism helpers:
    • is_model_parallel_parameter -- Checks if a parameter is marked for tensor model parallelism.
    • bwc_tensor_model_parallel_rank -- Backwards-compatible query for tensor model parallel rank, supporting both old and new Megatron API conventions.
    • align_dense_tensors -- Aligns tensor storage to NCCL boundaries for efficient communication.
    • all_gather_dp_groups -- All-gathers partitioned tensors across data-parallel groups.
  • General utilities:
    • DummyOptim -- A dummy optimizer that presents model parameters as a param group, used for ZeRO-3 without an optimizer.
    • set_random_seed -- Sets seeds for Python's random, numpy, and torch PRNGs.
    • ensure_directory_exists -- Creates directory paths for checkpoint saving.
    • get_ma_status -- Retrieves mixed-precision autocast status.

Usage

These utilities are called throughout the DeepSpeed runtime by the engine, optimizers, and ZeRO implementations. They are not typically called directly by users. This module is part of the vendored benchmark dependencies in FlexLLMGen.

Code Reference

Field Value
Repository FlexLLMGen
File benchmark/third_party/DeepSpeed/deepspeed/runtime/utils.py
Lines 1-1018
Type AUTO_KEEP (vendored dependency)

Key function signatures:

class DummyOptim():
    def __init__(self, params):
        self.param_groups = [{'params': params}]

def see_memory_usage(message, force=False):
    ...

def is_model_parallel_parameter(p) -> bool:
    ...

def bwc_tensor_model_parallel_rank(mpu=None):
    ...

def clip_grad_norm_(parameters, max_norm, norm_type=2, mpu=None):
    ...

def get_global_norm_of_tensors(input_tensors, norm_type=2, mpu=None):
    ...

I/O Contract

Inputs

Parameter Type Required Description
parameters Iterable[Tensor] Yes Model parameters for gradient operations
max_norm float Yes Maximum gradient norm for clipping
norm_type int No Norm type for gradient computation (default: 2)
mpu object No Model parallel unit for distributed norm computation

Outputs

Output Type Description
total_norm float Computed global gradient norm
clipped_grads Tensor Gradients scaled to enforce the maximum norm constraint
memory_info log output GPU/CPU memory usage statistics

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment