Implementation:FMInference FlexLLMGen DeepSpeed Runtime Utils
| Field | Value |
|---|---|
| Sources | Repo: FlexLLMGen, Upstream: DeepSpeed |
| Domains | Runtime_Infrastructure, Distributed_Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Vendored DeepSpeed utility module providing helper functions for gradient operations, memory monitoring, tensor parallelism, random seed management, and distributed communication used throughout the runtime.
Description
The utils.py file (1018 lines) is a vendored copy of DeepSpeed's runtime utility collection, containing helper functions sourced from NVIDIA's Megatron-LM and extended for DeepSpeed's needs.
Key components include:
- Gradient utilities:
- clip_grad_norm_ -- Clips gradient global norm across all parameters, handling model-parallel and expert-parallel parameter groups separately.
- get_global_norm_of_tensors -- Computes the global L2 norm across a list of tensors, with all-reduce for distributed computation.
- clip_tensors_by_global_norm -- Scales tensors to enforce a maximum global norm.
- get_grad_norm -- Computes gradient norm with special handling for model-parallel parameters (avoiding double-counting across ranks).
- Memory monitoring:
- see_memory_usage -- Logs GPU memory allocated, cached, and max allocated, plus CPU RAM and virtual memory usage via psutil. Optionally forces garbage collection and CUDA cache clearing.
- Parallelism helpers:
- is_model_parallel_parameter -- Checks if a parameter is marked for tensor model parallelism.
- bwc_tensor_model_parallel_rank -- Backwards-compatible query for tensor model parallel rank, supporting both old and new Megatron API conventions.
- align_dense_tensors -- Aligns tensor storage to NCCL boundaries for efficient communication.
- all_gather_dp_groups -- All-gathers partitioned tensors across data-parallel groups.
- General utilities:
- DummyOptim -- A dummy optimizer that presents model parameters as a param group, used for ZeRO-3 without an optimizer.
- set_random_seed -- Sets seeds for Python's random, numpy, and torch PRNGs.
- ensure_directory_exists -- Creates directory paths for checkpoint saving.
- get_ma_status -- Retrieves mixed-precision autocast status.
Usage
These utilities are called throughout the DeepSpeed runtime by the engine, optimizers, and ZeRO implementations. They are not typically called directly by users. This module is part of the vendored benchmark dependencies in FlexLLMGen.
Code Reference
| Field | Value |
|---|---|
| Repository | FlexLLMGen |
| File | benchmark/third_party/DeepSpeed/deepspeed/runtime/utils.py |
| Lines | 1-1018 |
| Type | AUTO_KEEP (vendored dependency) |
Key function signatures:
class DummyOptim():
def __init__(self, params):
self.param_groups = [{'params': params}]
def see_memory_usage(message, force=False):
...
def is_model_parallel_parameter(p) -> bool:
...
def bwc_tensor_model_parallel_rank(mpu=None):
...
def clip_grad_norm_(parameters, max_norm, norm_type=2, mpu=None):
...
def get_global_norm_of_tensors(input_tensors, norm_type=2, mpu=None):
...
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| parameters | Iterable[Tensor] | Yes | Model parameters for gradient operations |
| max_norm | float | Yes | Maximum gradient norm for clipping |
| norm_type | int | No | Norm type for gradient computation (default: 2) |
| mpu | object | No | Model parallel unit for distributed norm computation |
Outputs
| Output | Type | Description |
|---|---|---|
| total_norm | float | Computed global gradient norm |
| clipped_grads | Tensor | Gradients scaled to enforce the maximum norm constraint |
| memory_info | log output | GPU/CPU memory usage statistics |