Implementation:Deepspeedai DeepSpeed Initialize Mesh Device
Overview
Concrete tool for creating a multi-dimensional device mesh for sequence-parallel training provided by the DeepSpeed library.
Description
deepspeed.comm.initialize_mesh_device() creates a device mesh with named dimensions (typically "data_parallel" and "sequence_parallel"). It uses the communication backend to establish process groups for each dimension. The returned mesh_device object is passed to DeepSpeedConfig to correctly compute the effective world_size for the data-parallel dimension.
The function first asserts that the DeepSpeed communication backend is initialized. It then delegates to the backend's init_device_mesh method if supported. If the backend does not support mesh device initialization, a warning is logged and None is returned. This is typically called internally by deepspeed.initialize() when mesh_param is provided, but can also be invoked directly.
Code Reference
- Repository: https://github.com/deepspeedai/DeepSpeed
- File:
deepspeed/comm/comm.py - Lines: L761-773
Signature
def initialize_mesh_device(mesh_shape: tuple, mesh_dim_names: tuple) -> Optional[object]
Import
from deepspeed.comm import initialize_mesh_device
# Or invoked implicitly via:
# deepspeed.initialize(mesh_param=(dp_size, sp_size))
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| mesh_shape | tuple | Yes | Shape of the mesh, e.g. (dp_size, sp_size)
|
| mesh_dim_names | tuple | Yes | Names for each dimension, e.g. ("data_parallel", "sequence_parallel")
|
Outputs
| Output | Type | Description |
|---|---|---|
| mesh_device | object or None | The mesh device object, or None if the backend does not support mesh initialization
|
Usage Example
import deepspeed
# 8 GPUs: 2 data-parallel groups x 4 sequence-parallel within each group
engine, _, _, _ = deepspeed.initialize(
model=model,
config=ds_config,
mesh_param=(2, 4) # (dp_size, sp_size)
)
# Or call directly:
from deepspeed.comm import initialize_mesh_device
mesh_device = initialize_mesh_device(
mesh_shape=(2, 4),
mesh_dim_names=("data_parallel", "sequence_parallel")
)
Related Pages
Knowledge Sources
- https://github.com/deepspeedai/DeepSpeed
- https://www.deepspeed.ai/tutorials/ulysses-alst-sequence-parallelism/
Last updated: 2026-02-09 00:00 GMT