Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Deepspeedai DeepSpeed Initialize For SP

From Leeroopedia


Overview

Concrete tool for initializing a DeepSpeed engine with sequence parallelism support provided by the DeepSpeed library.

Description

deepspeed.initialize() with mesh_param=(dp_size, sp_size) creates a mesh device, adjusts the world_size for the DP dimension, and passes the mesh_device to DeepSpeedConfig for correct batch size calculation. The mpu parameter from register_with_transformers() provides the SP group for communication.

When mesh_param is provided, the initialization flow is:

  1. Call dist.initialize_mesh_device(mesh_param, ("data_parallel", "sequence_parallel")) to create the mesh
  2. Pass mesh_device to DeepSpeedConfig, which extracts the data-parallel group's world size via mesh_device.get_group(mesh_dim="data_parallel")
  3. Construct DeepSpeedEngine with the adjusted config, mpu, and mesh_device

Alternatively, if mesh_param is not provided but sequence_parallel_size and data_parallel_size are present in the config dictionary, the mesh is created from those config values instead.

Code Reference

Signature

deepspeed.initialize(
    model=model,          # nn.Module, required
    config=config,        # dict or str path, required
    mesh_param=None,      # tuple (dp_size, sp_size), optional
    mpu=None,             # mpu object from register_with_transformers, optional
    optimizer=None,       # optional user-defined optimizer
    model_parameters=None,# optional parameter groups
    lr_scheduler=None,    # optional scheduler
    # ... other standard parameters
) -> Tuple[DeepSpeedEngine, Optimizer, DataLoader, LRScheduler]

Import

import deepspeed

I/O Contract

Inputs

Parameter Type Required Description
model torch.nn.Module Yes The model (with SP attention already registered)
config dict or str Yes DeepSpeed configuration dictionary or path to JSON config file
mesh_param tuple No (dp_size, sp_size) defining the mesh dimensions
mpu object No Model parallel unit from register_with_transformers()
optimizer Optimizer No User-defined optimizer (overrides config)
model_parameters iterable No Parameters to optimize
lr_scheduler LRScheduler No Learning rate scheduler

Outputs

Output Type Description
engine DeepSpeedEngine Runtime engine with correct world_size for SP, mesh_device for SP group communication
optimizer Optimizer Wrapped optimizer (or None)
training_dataloader DataLoader DeepSpeed dataloader (or None if no training_data provided)
lr_scheduler LRScheduler Wrapped scheduler (or None)

Usage Example

import deepspeed
from deepspeed.runtime.sequence_parallel.ulysses_sp import UlyssesSPAttentionHF

# Step 1: Register SP attention (before model instantiation or after)
mpu = UlyssesSPAttentionHF.register_with_transformers(
    model_name_or_path="meta-llama/Llama-2-7b-hf",
    core_attn_implementation="flash_attention_2",
    sequence_parallel_size=4,
    micro_batch_size=1,
)

# Step 2: Initialize the engine with mesh_param
engine, optimizer, _, lr_scheduler = deepspeed.initialize(
    model=model,
    config=ds_config,
    mesh_param=(2, 4),  # 2 DP groups x 4 SP within each
    mpu=mpu,
)

# Alternative: specify in config instead of mesh_param
ds_config_with_sp = {
    "data_parallel_size": 2,
    "sequence_parallel_size": 4,
    "train_micro_batch_size_per_gpu": 1,
    # ... other config
}
engine, _, _, _ = deepspeed.initialize(
    model=model,
    config=ds_config_with_sp,
    mpu=mpu,
)

Related Pages

Knowledge Sources

Last updated: 2026-02-09 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment