Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FMInference FlexLLMGen DeepSpeed Init

From Leeroopedia


Knowledge Sources
Domains Deep Learning, Distributed Training, Inference, Python
Last Updated 2026-02-09 12:00 GMT

Overview

The DeepSpeed package entry point module that exports the initialize() function for distributed training setup and the init_inference() function for optimized inference engine creation.

Description

This file serves as the top-level __init__.py for the vendored DeepSpeed package within FlexLLMGen. It provides two primary public APIs:

initialize() sets up the DeepSpeed distributed training engine. It accepts a model, optimizer, learning rate scheduler, training data, and configuration (as either a JSON file path or dictionary). The function:

  • Shuts down any active zero.Init context to prevent parameter partitioning conflicts.
  • Creates either a DeepSpeedEngine (standard models) or a PipelineEngine (for PipelineModule instances) based on the model type.
  • Returns a 4-tuple of (engine, optimizer, training_dataloader, lr_scheduler).

init_inference() creates a DeepSpeed InferenceEngine for optimized model serving. It supports four usage patterns:

  1. No config, no kwargs: uses default_inference_config().
  2. Config dict or JSON path only.
  3. Keyword arguments only (e.g., mp_size, dtype, replace_with_kernel_inject).
  4. Both config and kwargs (merged, with conflict detection).

The module also exports:

  • add_config_arguments(): Adds DeepSpeed CLI arguments (--deepspeed, --deepspeed_config, --deepspeed_mpi) to an argument parser.
  • default_inference_config(): Returns the default DeepSpeedInferenceConfig as a dictionary.
  • Version information: __version__, __git_hash__, __git_branch__.
  • Re-exports from sub-packages: checkpointing, DeepSpeedTransformerLayer, PipelineModule, zero, OnDevice, replace_transformer_layer, etc.

Usage

This module is the primary entry point for all DeepSpeed functionality within the FlexLLMGen benchmark suite. Training scripts call deepspeed.initialize() and inference scripts call deepspeed.init_inference().

Code Reference

Source Location

Signature

def initialize(args=None,
               model: torch.nn.Module = None,
               optimizer: Optional[Union[Optimizer, DeepSpeedOptimizerCallable]] = None,
               model_parameters: Optional[torch.nn.Module] = None,
               training_data: Optional[torch.utils.data.Dataset] = None,
               lr_scheduler: Optional[Union[_LRScheduler, DeepSpeedSchedulerCallable]] = None,
               mpu=None,
               dist_init_required: Optional[bool] = None,
               collate_fn=None,
               config=None,
               config_params=None):
    """Initialize the DeepSpeed Engine."""

def init_inference(model, config=None, **kwargs):
    """Initialize the DeepSpeed InferenceEngine."""

def add_config_arguments(parser):
    """Update the argument parser to enable parsing of DeepSpeed command line arguments."""

Import

import deepspeed

# Or from the vendored location:
from benchmark.third_party.DeepSpeed import deepspeed

I/O Contract

Inputs

Name Type Required Description
model torch.nn.Module Yes The neural network model to wrap with DeepSpeed.
optimizer Optimizer or Callable No User-defined optimizer or callable that returns an optimizer. Overrides JSON config.
model_parameters Iterable[torch.Tensor] No Specifies which tensors to optimize.
training_data torch.utils.data.Dataset No Training dataset for dataloader creation.
lr_scheduler _LRScheduler or Callable No Learning rate scheduler object or factory callable.
mpu object No Model parallelism unit implementing get_{model,data}_parallel_{rank,group,world_size}().
config str or dict No DeepSpeed configuration as a JSON file path or dictionary.
kwargs dict No For init_inference: additional config parameters (e.g., mp_size, dtype, replace_with_kernel_inject).

Outputs

Name Type Description
engine DeepSpeedEngine For initialize(): the wrapped model engine for distributed training.
optimizer Optimizer For initialize(): the wrapped optimizer (or None).
training_dataloader DataLoader For initialize(): the DeepSpeed dataloader (or None).
lr_scheduler _LRScheduler For initialize(): the wrapped LR scheduler (or None).
engine InferenceEngine For init_inference(): the wrapped model for optimized inference.

Usage Examples

import deepspeed
import torch

# Training initialization
model = MyModel()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
engine, optimizer, dataloader, scheduler = deepspeed.initialize(
    model=model,
    optimizer=optimizer,
    config="ds_config.json"
)

# Inference initialization
model = AutoModelForCausalLM.from_pretrained("gpt2")
engine = deepspeed.init_inference(
    model,
    mp_size=1,
    dtype=torch.half,
    replace_with_kernel_inject=True
)
output = engine("DeepSpeed is")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment