Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:FMInference FlexLLMGen DeepSpeed Runtime Config

From Leeroopedia


Field Value
Sources Repo: FlexLLMGen, Upstream: DeepSpeed
Domains Configuration_Management, Runtime_Infrastructure
Last Updated 2026-02-09 00:00 GMT

Overview

Vendored DeepSpeed module that parses JSON configuration files into a structured DeepSpeedConfig object, validating and resolving all training settings including optimizer, scheduler, precision, ZeRO, and communication parameters.

Description

The config.py file (1071 lines) is a vendored copy of DeepSpeed's runtime configuration parser. It converts a JSON configuration dictionary (loaded from ds_config.json) into a validated DeepSpeedConfig object used throughout the DeepSpeed engine.

Key components include:

  • DeepSpeedConfig -- The main configuration class that exposes validated attributes for all DeepSpeed features: batch sizes, gradient accumulation, FP16/BF16 settings, ZeRO optimization, sparse attention, optimizer/scheduler selection, curriculum learning, progressive layer drop, AMP integration, and more.
  • Helper extraction functions -- Numerous get_* functions that safely extract configuration values from the JSON dictionary with defaults:
    • get_fp16_enabled, get_bfloat16_enabled -- Precision settings
    • get_loss_scale, get_initial_dynamic_scale, get_dynamic_loss_scale_args -- Loss scaling for FP16
    • get_amp_enabled, get_amp_params -- NVIDIA Apex AMP integration
    • get_pld_enabled, get_pld_params -- Progressive layer drop
    • get_curriculum_enabled, get_curriculum_params -- Curriculum learning
    • Sparse attention configuration parsers for fixed, variable, BigBird, and BSLongformer modes
  • DeepSpeedConfigError -- Custom exception for configuration validation failures.
  • Batch size resolution -- Automatically computes train_batch_size, train_micro_batch_size_per_gpu, or gradient_accumulation_steps from the other two based on the number of GPUs.

The configuration supports multiple optimizer types: Adam, AdamW, Adagrad, LAMB, OneBitAdam, OneBitLamb, and ZeroOneAdam, with specific parameter handling for torch Adam mode and Adam W mode.

Usage

A DeepSpeedConfig is created internally by the DeepSpeedEngine during initialization. Users provide a JSON configuration file or dictionary. This module is part of the vendored benchmark dependencies in FlexLLMGen.

Code Reference

Field Value
Repository FlexLLMGen
File benchmark/third_party/DeepSpeed/deepspeed/runtime/config.py
Lines 1-1071
Type AUTO_KEEP (vendored dependency)

Key class signature:

class DeepSpeedConfig():
    def __init__(self, config: Union[str, dict], mpu=None):
        ...

I/O Contract

Inputs

Parameter Type Required Description
config Union[str, dict] Yes Path to JSON config file or dictionary of config values
mpu object No Model parallel unit for resolving world size

Outputs

Output Type Description
DeepSpeedConfig object Validated configuration with attributes for all DeepSpeed features
train_batch_size int Resolved global training batch size
train_micro_batch_size_per_gpu int Per-GPU micro batch size
gradient_accumulation_steps int Number of gradient accumulation steps
zero_optimization_config ZeroConfig ZeRO optimization stage and settings

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment