Implementation:Deepspeedai DeepSpeed Initialize
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Training, Training_Orchestration, Memory_Optimization |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for creating a DeepSpeed distributed training engine provided by the DeepSpeed library.
Description
The deepspeed.initialize() function is the main entry point for DeepSpeed training. It takes a PyTorch model, optional optimizer, and configuration, then returns a DeepSpeedEngine (or PipelineEngine for PipelineModule, or DeepSpeedHybridEngine when hybrid_engine.enabled=True). It handles:
- Distributed backend initialization: Calls dist.init_distributed() with the appropriate backend (NCCL, etc.)
- Config parsing: Creates a DeepSpeedConfig object from the provided config file or dictionary
- Zero.Init context management: Shuts down any active zero.Init context before engine construction, then restores it afterward
- Engine type routing:
- PipelineModule input routes to PipelineEngine
- hybrid_engine.enabled=True routes to DeepSpeedHybridEngine
- Otherwise routes to DeepSpeedEngine
- Mesh device setup: Initializes device mesh for sequence parallelism from mesh_param or config
- AutoTP integration: Merges tensor parallelism config and sets AutoTP mode if configured
- Optimizer flag setting: Marks parameters for specialized optimizers (e.g., Muon)
Usage
Call this function once before the training loop. The model parameter is required; all others are optional. The returned 4-tuple provides all objects needed for the training loop.
Code Reference
Source Location
- Repository: DeepSpeed
- File: deepspeed/__init__.py
- Lines: 80-252
Signature
def initialize(args=None,
model: torch.nn.Module = None,
optimizer: Optional[Union[Optimizer, DeepSpeedOptimizerCallable]] = None,
model_parameters: Optional[torch.nn.Module] = None,
training_data: Optional[torch.utils.data.Dataset] = None,
lr_scheduler: Optional[Union[_LRScheduler, DeepSpeedSchedulerCallable]] = None,
distributed_port: int = TORCH_DISTRIBUTED_DEFAULT_PORT,
mpu=None,
dist_init_required: Optional[bool] = None,
collate_fn=None,
config=None,
mesh_param=None,
config_params=None):
Import
import deepspeed
engine, optimizer, dataloader, lr_scheduler = deepspeed.initialize(...)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | torch.nn.Module | Yes | The PyTorch model to wrap with the DeepSpeed engine |
| optimizer | Union[Optimizer, Callable] | No | User-defined optimizer or callable that returns one; overrides JSON config optimizer |
| model_parameters | iterable | No | Iterable of torch.Tensors or dicts specifying which tensors to optimize |
| training_data | torch.utils.data.Dataset | No | Training dataset; DeepSpeed creates a DataLoader if provided |
| lr_scheduler | Union[_LRScheduler, Callable] | No | Learning rate scheduler object or callable that takes an optimizer |
| config | Union[str, dict] | Yes | DeepSpeed JSON config file path or dictionary (or via args.deepspeed_config) |
| args | object | No | Object with local_rank and deepspeed_config fields (alternative to config parameter) |
| distributed_port | int | No | Master node port for distributed communication (default: 29500) |
| mpu | object | No | Model parallelism unit implementing get_{model,data}_parallel_{rank,group,world_size}() |
| dist_init_required | Optional[bool] | No | Force or skip torch.distributed initialization (None for auto-detect) |
| collate_fn | Callable | No | Custom collate function for the DataLoader |
| mesh_param | tuple | No | Mesh parameters for device mesh initialization (data_parallel, sequence_parallel) |
| config_params | Union[str, dict] | No | Same as config, kept for backwards compatibility |
Outputs
| Name | Type | Description |
|---|---|---|
| engine | DeepSpeedEngine | The DeepSpeed runtime engine wrapping the model for distributed training |
| optimizer | Optimizer | Wrapped optimizer (user-defined or from config); None if not configured |
| training_dataloader | DataLoader | DeepSpeed DataLoader if training_data was supplied; otherwise None |
| lr_scheduler | _LRScheduler | Wrapped LR scheduler if provided or configured in JSON; otherwise None |
Usage Examples
import deepspeed
import torch
import torch.nn as nn
# Define a simple model
model = nn.Linear(1024, 1024)
# Initialize with a config file
engine, optimizer, _, lr_scheduler = deepspeed.initialize(
model=model,
config="ds_config.json",
model_parameters=model.parameters(),
)
# Initialize with a config dictionary
config = {
"train_batch_size": 32,
"gradient_accumulation_steps": 4,
"zero_optimization": {"stage": 2},
"fp16": {"enabled": True, "initial_scale_power": 16},
"optimizer": {
"type": "Adam",
"params": {"lr": 3e-5, "betas": [0.9, 0.999]}
}
}
engine, optimizer, _, _ = deepspeed.initialize(
model=model,
config=config,
model_parameters=model.parameters(),
)
# With a user-provided optimizer
user_optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
engine, optimizer, _, _ = deepspeed.initialize(
model=model,
optimizer=user_optimizer,
config="ds_config.json",
)
# The engine is now the primary interface for training
outputs = engine(input_batch)
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment