Implementation:Deepspeedai DeepSpeed Initialize

Knowledge Sources	DeepSpeed DeepSpeed Getting Started
Domains	Distributed_Training, Training_Orchestration, Memory_Optimization
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for creating a DeepSpeed distributed training engine provided by the DeepSpeed library.

Description

The deepspeed.initialize() function is the main entry point for DeepSpeed training. It takes a PyTorch model, optional optimizer, and configuration, then returns a DeepSpeedEngine (or PipelineEngine for PipelineModule, or DeepSpeedHybridEngine when hybrid_engine.enabled=True). It handles:

Distributed backend initialization: Calls dist.init_distributed() with the appropriate backend (NCCL, etc.)
Config parsing: Creates a DeepSpeedConfig object from the provided config file or dictionary
Zero.Init context management: Shuts down any active zero.Init context before engine construction, then restores it afterward
Engine type routing:
- PipelineModule input routes to PipelineEngine
- hybrid_engine.enabled=True routes to DeepSpeedHybridEngine
- Otherwise routes to DeepSpeedEngine
Mesh device setup: Initializes device mesh for sequence parallelism from mesh_param or config
AutoTP integration: Merges tensor parallelism config and sets AutoTP mode if configured
Optimizer flag setting: Marks parameters for specialized optimizers (e.g., Muon)

Usage

Call this function once before the training loop. The model parameter is required; all others are optional. The returned 4-tuple provides all objects needed for the training loop.

Code Reference

Source Location

Repository: DeepSpeed
File: deepspeed/__init__.py
Lines: 80-252

Signature

def initialize(args=None,
               model: torch.nn.Module = None,
               optimizer: Optional[Union[Optimizer, DeepSpeedOptimizerCallable]] = None,
               model_parameters: Optional[torch.nn.Module] = None,
               training_data: Optional[torch.utils.data.Dataset] = None,
               lr_scheduler: Optional[Union[_LRScheduler, DeepSpeedSchedulerCallable]] = None,
               distributed_port: int = TORCH_DISTRIBUTED_DEFAULT_PORT,
               mpu=None,
               dist_init_required: Optional[bool] = None,
               collate_fn=None,
               config=None,
               mesh_param=None,
               config_params=None):

Import

import deepspeed

engine, optimizer, dataloader, lr_scheduler = deepspeed.initialize(...)

I/O Contract

Inputs

Name	Type	Required	Description
model	torch.nn.Module	Yes	The PyTorch model to wrap with the DeepSpeed engine
optimizer	Union[Optimizer, Callable]	No	User-defined optimizer or callable that returns one; overrides JSON config optimizer
model_parameters	iterable	No	Iterable of torch.Tensors or dicts specifying which tensors to optimize
training_data	torch.utils.data.Dataset	No	Training dataset; DeepSpeed creates a DataLoader if provided
lr_scheduler	Union[_LRScheduler, Callable]	No	Learning rate scheduler object or callable that takes an optimizer
config	Union[str, dict]	Yes	DeepSpeed JSON config file path or dictionary (or via args.deepspeed_config)
args	object	No	Object with local_rank and deepspeed_config fields (alternative to config parameter)
distributed_port	int	No	Master node port for distributed communication (default: 29500)
mpu	object	No	Model parallelism unit implementing get_{model,data}_parallel_{rank,group,world_size}()
dist_init_required	Optional[bool]	No	Force or skip torch.distributed initialization (None for auto-detect)
collate_fn	Callable	No	Custom collate function for the DataLoader
mesh_param	tuple	No	Mesh parameters for device mesh initialization (data_parallel, sequence_parallel)
config_params	Union[str, dict]	No	Same as config, kept for backwards compatibility

Outputs

Name	Type	Description
engine	DeepSpeedEngine	The DeepSpeed runtime engine wrapping the model for distributed training
optimizer	Optimizer	Wrapped optimizer (user-defined or from config); None if not configured
training_dataloader	DataLoader	DeepSpeed DataLoader if training_data was supplied; otherwise None
lr_scheduler	_LRScheduler	Wrapped LR scheduler if provided or configured in JSON; otherwise None

Usage Examples

import deepspeed
import torch
import torch.nn as nn

# Define a simple model
model = nn.Linear(1024, 1024)

# Initialize with a config file
engine, optimizer, _, lr_scheduler = deepspeed.initialize(
    model=model,
    config="ds_config.json",
    model_parameters=model.parameters(),
)

# Initialize with a config dictionary
config = {
    "train_batch_size": 32,
    "gradient_accumulation_steps": 4,
    "zero_optimization": {"stage": 2},
    "fp16": {"enabled": True, "initial_scale_power": 16},
    "optimizer": {
        "type": "Adam",
        "params": {"lr": 3e-5, "betas": [0.9, 0.999]}
    }
}
engine, optimizer, _, _ = deepspeed.initialize(
    model=model,
    config=config,
    model_parameters=model.parameters(),
)

# With a user-provided optimizer
user_optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
engine, optimizer, _, _ = deepspeed.initialize(
    model=model,
    optimizer=user_optimizer,
    config="ds_config.json",
)

# The engine is now the primary interface for training
outputs = engine(input_batch)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment