Implementation:FMInference FlexLLMGen DeepSpeed Init

Knowledge Sources	FMInference_FlexLLMGen
Domains	Deep Learning, Distributed Training, Inference, Python
Last Updated	2026-02-09 12:00 GMT

Overview

The DeepSpeed package entry point module that exports the initialize() function for distributed training setup and the init_inference() function for optimized inference engine creation.

Description

This file serves as the top-level __init__.py for the vendored DeepSpeed package within FlexLLMGen. It provides two primary public APIs:

initialize() sets up the DeepSpeed distributed training engine. It accepts a model, optimizer, learning rate scheduler, training data, and configuration (as either a JSON file path or dictionary). The function:

Shuts down any active zero.Init context to prevent parameter partitioning conflicts.
Creates either a DeepSpeedEngine (standard models) or a PipelineEngine (for PipelineModule instances) based on the model type.
Returns a 4-tuple of (engine, optimizer, training_dataloader, lr_scheduler).

init_inference() creates a DeepSpeed InferenceEngine for optimized model serving. It supports four usage patterns:

No config, no kwargs: uses default_inference_config().
Config dict or JSON path only.
Keyword arguments only (e.g., mp_size, dtype, replace_with_kernel_inject).
Both config and kwargs (merged, with conflict detection).

The module also exports:

add_config_arguments(): Adds DeepSpeed CLI arguments (--deepspeed, --deepspeed_config, --deepspeed_mpi) to an argument parser.
default_inference_config(): Returns the default DeepSpeedInferenceConfig as a dictionary.
Version information: __version__, __git_hash__, __git_branch__.
Re-exports from sub-packages: checkpointing, DeepSpeedTransformerLayer, PipelineModule, zero, OnDevice, replace_transformer_layer, etc.

Usage

This module is the primary entry point for all DeepSpeed functionality within the FlexLLMGen benchmark suite. Training scripts call deepspeed.initialize() and inference scripts call deepspeed.init_inference().

Code Reference

Source Location

Repository: FMInference_FlexLLMGen
File: benchmark/third_party/DeepSpeed/deepspeed/__init__.py
Lines: 1-313

Signature

def initialize(args=None,
               model: torch.nn.Module = None,
               optimizer: Optional[Union[Optimizer, DeepSpeedOptimizerCallable]] = None,
               model_parameters: Optional[torch.nn.Module] = None,
               training_data: Optional[torch.utils.data.Dataset] = None,
               lr_scheduler: Optional[Union[_LRScheduler, DeepSpeedSchedulerCallable]] = None,
               mpu=None,
               dist_init_required: Optional[bool] = None,
               collate_fn=None,
               config=None,
               config_params=None):
    """Initialize the DeepSpeed Engine."""

def init_inference(model, config=None, **kwargs):
    """Initialize the DeepSpeed InferenceEngine."""

def add_config_arguments(parser):
    """Update the argument parser to enable parsing of DeepSpeed command line arguments."""

Import

import deepspeed

# Or from the vendored location:
from benchmark.third_party.DeepSpeed import deepspeed

I/O Contract

Inputs

Name	Type	Required	Description
model	torch.nn.Module	Yes	The neural network model to wrap with DeepSpeed.
optimizer	Optimizer or Callable	No	User-defined optimizer or callable that returns an optimizer. Overrides JSON config.
model_parameters	Iterable[torch.Tensor]	No	Specifies which tensors to optimize.
training_data	torch.utils.data.Dataset	No	Training dataset for dataloader creation.
lr_scheduler	_LRScheduler or Callable	No	Learning rate scheduler object or factory callable.
mpu	object	No	Model parallelism unit implementing get_{model,data}_parallel_{rank,group,world_size}().
config	str or dict	No	DeepSpeed configuration as a JSON file path or dictionary.
kwargs	dict	No	For init_inference: additional config parameters (e.g., mp_size, dtype, replace_with_kernel_inject).

Outputs

Name	Type	Description
engine	DeepSpeedEngine	For initialize(): the wrapped model engine for distributed training.
optimizer	Optimizer	For initialize(): the wrapped optimizer (or None).
training_dataloader	DataLoader	For initialize(): the DeepSpeed dataloader (or None).
lr_scheduler	_LRScheduler	For initialize(): the wrapped LR scheduler (or None).
engine	InferenceEngine	For init_inference(): the wrapped model for optimized inference.

Usage Examples

import deepspeed
import torch

# Training initialization
model = MyModel()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
engine, optimizer, dataloader, scheduler = deepspeed.initialize(
    model=model,
    optimizer=optimizer,
    config="ds_config.json"
)

# Inference initialization
model = AutoModelForCausalLM.from_pretrained("gpt2")
engine = deepspeed.init_inference(
    model,
    mp_size=1,
    dtype=torch.half,
    replace_with_kernel_inject=True
)
output = engine("DeepSpeed is")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment