Implementation:FMInference FlexLLMGen DeepSpeed Init
| Knowledge Sources | |
|---|---|
| Domains | Deep Learning, Distributed Training, Inference, Python |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
The DeepSpeed package entry point module that exports the initialize() function for distributed training setup and the init_inference() function for optimized inference engine creation.
Description
This file serves as the top-level __init__.py for the vendored DeepSpeed package within FlexLLMGen. It provides two primary public APIs:
initialize() sets up the DeepSpeed distributed training engine. It accepts a model, optimizer, learning rate scheduler, training data, and configuration (as either a JSON file path or dictionary). The function:
- Shuts down any active zero.Init context to prevent parameter partitioning conflicts.
- Creates either a DeepSpeedEngine (standard models) or a PipelineEngine (for PipelineModule instances) based on the model type.
- Returns a 4-tuple of (engine, optimizer, training_dataloader, lr_scheduler).
init_inference() creates a DeepSpeed InferenceEngine for optimized model serving. It supports four usage patterns:
- No config, no kwargs: uses default_inference_config().
- Config dict or JSON path only.
- Keyword arguments only (e.g., mp_size, dtype, replace_with_kernel_inject).
- Both config and kwargs (merged, with conflict detection).
The module also exports:
- add_config_arguments(): Adds DeepSpeed CLI arguments (--deepspeed, --deepspeed_config, --deepspeed_mpi) to an argument parser.
- default_inference_config(): Returns the default DeepSpeedInferenceConfig as a dictionary.
- Version information: __version__, __git_hash__, __git_branch__.
- Re-exports from sub-packages: checkpointing, DeepSpeedTransformerLayer, PipelineModule, zero, OnDevice, replace_transformer_layer, etc.
Usage
This module is the primary entry point for all DeepSpeed functionality within the FlexLLMGen benchmark suite. Training scripts call deepspeed.initialize() and inference scripts call deepspeed.init_inference().
Code Reference
Source Location
- Repository: FMInference_FlexLLMGen
- File: benchmark/third_party/DeepSpeed/deepspeed/__init__.py
- Lines: 1-313
Signature
def initialize(args=None,
model: torch.nn.Module = None,
optimizer: Optional[Union[Optimizer, DeepSpeedOptimizerCallable]] = None,
model_parameters: Optional[torch.nn.Module] = None,
training_data: Optional[torch.utils.data.Dataset] = None,
lr_scheduler: Optional[Union[_LRScheduler, DeepSpeedSchedulerCallable]] = None,
mpu=None,
dist_init_required: Optional[bool] = None,
collate_fn=None,
config=None,
config_params=None):
"""Initialize the DeepSpeed Engine."""
def init_inference(model, config=None, **kwargs):
"""Initialize the DeepSpeed InferenceEngine."""
def add_config_arguments(parser):
"""Update the argument parser to enable parsing of DeepSpeed command line arguments."""
Import
import deepspeed
# Or from the vendored location:
from benchmark.third_party.DeepSpeed import deepspeed
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | torch.nn.Module | Yes | The neural network model to wrap with DeepSpeed. |
| optimizer | Optimizer or Callable | No | User-defined optimizer or callable that returns an optimizer. Overrides JSON config. |
| model_parameters | Iterable[torch.Tensor] | No | Specifies which tensors to optimize. |
| training_data | torch.utils.data.Dataset | No | Training dataset for dataloader creation. |
| lr_scheduler | _LRScheduler or Callable | No | Learning rate scheduler object or factory callable. |
| mpu | object | No | Model parallelism unit implementing get_{model,data}_parallel_{rank,group,world_size}(). |
| config | str or dict | No | DeepSpeed configuration as a JSON file path or dictionary. |
| kwargs | dict | No | For init_inference: additional config parameters (e.g., mp_size, dtype, replace_with_kernel_inject). |
Outputs
| Name | Type | Description |
|---|---|---|
| engine | DeepSpeedEngine | For initialize(): the wrapped model engine for distributed training. |
| optimizer | Optimizer | For initialize(): the wrapped optimizer (or None). |
| training_dataloader | DataLoader | For initialize(): the DeepSpeed dataloader (or None). |
| lr_scheduler | _LRScheduler | For initialize(): the wrapped LR scheduler (or None). |
| engine | InferenceEngine | For init_inference(): the wrapped model for optimized inference. |
Usage Examples
import deepspeed
import torch
# Training initialization
model = MyModel()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
engine, optimizer, dataloader, scheduler = deepspeed.initialize(
model=model,
optimizer=optimizer,
config="ds_config.json"
)
# Inference initialization
model = AutoModelForCausalLM.from_pretrained("gpt2")
engine = deepspeed.init_inference(
model,
mp_size=1,
dtype=torch.half,
replace_with_kernel_inject=True
)
output = engine("DeepSpeed is")