Implementation:Deepspeedai DeepSpeed AutoTP Replace
Overview
Concrete tool for automatically detecting and replacing transformer layers with tensor-parallel variants provided by the DeepSpeed library.
Implementation Type
Class (module replacement orchestrator)
Detailed Description
The AutoTP class detects transformer layers using AutoTPConfig presets or custom patterns and replaces nn.Linear modules with TP variants. set_autotp_mode(training=True) sets the global mode flag. During deepspeed.initialize(), the engine calls replace_transformer_layer() which triggers AutoTP._replace_module() to swap standard linear layers with LinearAllreduce (row-parallel) and LinearLayer (column-parallel).
The AutoTP class provides the following key methods:
__init__(module, all_reduce_linears, prefix, state_dict, linear_layer_setting, orig_layer_impl, keep_module_on_host, partition_config): Initializes the replacement orchestrator with the target module, list of row-parallel layer names, and optional partition config.tp_parser(model): Static method that analyzes the model graph to automatically identify which linear layers should be row-parallel (AllReduce). It walks throughModuleListchildren, finds linear layers, and identifies row-parallel candidates based on their position relative toLayerNormboundaries and known naming patterns (e.g.,o_proj,out_proj,down_proj).set_tensor_parallel_config(mp_size, mp_group): In training mode, retrieves the TP group and world size fromdeepspeed.utils.groupsrather than using the passed arguments._replace_module(r_module, prev_name, prev_class_name): Recursively walks the model tree. Whenpartition_configis present, it uses pattern-based routing via_replace_with_config(). Otherwise, it uses type-based routing vialinear_policiesdictionary._replace(child, name, conv_linear_layer): The legacy replacement function that routes each layer to the appropriate TP variant based on the layer name and model architecture._replace_with_config(child, name): The new pattern-based replacement function that usespartition_config.find_matching_spec()to determine the partition type and calls_create_row_parallel_layer()or_create_column_parallel_layer().
The set_autotp_mode() function:
- Signature:
def set_autotp_mode(training: bool) -> None - Sets the global
DEEPSPEED_AUTOTP_MODEtoAUTOTP_MODE.TRAININGorAUTOTP_MODE.INFERENCE. - This affects TP layer behavior: training mode uses non-inplace operations for autograd compatibility and even partitioning via
torch.chunk().
Code Reference
- Repository: https://github.com/deepspeedai/DeepSpeed
- File:
deepspeed/module_inject/auto_tp.py(L194-630, AutoTP class) - File:
deepspeed/module_inject/layers.py(L42-50, set_autotp_mode) - AutoTP.__init__ signature:
def __init__(self, module, all_reduce_linears, prefix, state_dict, linear_layer_setting, orig_layer_impl, keep_module_on_host=False, partition_config=None) - Import:
from deepspeed.module_inject.layers import set_autotp_mode - Import:
from deepspeed.module_inject.auto_tp import AutoTP
Parameters
AutoTP constructor:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| module | torch.nn.Module | Yes | — | The transformer module to process |
| all_reduce_linears | list | Yes | — | Names of linear layers requiring AllReduce (row-parallel) |
| prefix | str | Yes | — | Parameter name prefix for state_dict lookup |
| state_dict | dict | Yes | — | Model state dictionary for weight loading |
| linear_layer_setting | list | Yes | — | Module class types for linear layer detection |
| orig_layer_impl | type | Yes | — | Original transformer layer implementation class |
| keep_module_on_host | bool | No | False | Keep weights on CPU to reduce GPU memory |
| partition_config | AutoTPConfig | No | None | Custom partition configuration with TPLayerSpec rules |
set_autotp_mode:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| training | bool | Yes | — | If True, sets TRAINING mode; if False, sets INFERENCE mode |
I/O
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | module | torch.nn.Module | Model with standard nn.Linear layers |
| Input | TP config | autotp_size, partition_config | Tensor parallelism settings |
| Output | module | torch.nn.Module | Model with nn.Linear layers replaced by LinearAllreduce/LinearLayer TP variants |
Usage Example
import deepspeed
import torch
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
# AutoTP is triggered automatically via deepspeed.initialize()
engine, _, _, _ = deepspeed.initialize(
model=model,
config={
"tensor_parallel": {"autotp_size": 4},
"train_batch_size": 8,
"bf16": {"enabled": True}
}
)
# model's linear layers are now TP-partitioned
# Using a preset model
engine, _, _, _ = deepspeed.initialize(
model=model,
config={
"tensor_parallel": {
"autotp_size": 4,
"preset_model": "llama"
},
"train_batch_size": 8,
"bf16": {"enabled": True}
}
)
# Using custom partition config
engine, _, _, _ = deepspeed.initialize(
model=model,
config={
"tensor_parallel": {
"autotp_size": 4,
"partition_config": {
"use_default_specs": False,
"layer_specs": [
{"patterns": [".*\\.o_proj\\.weight$"], "partition_type": "row"},
{"patterns": [".*\\.[qkv]_proj\\.weight$"], "partition_type": "column"}
]
}
},
"train_batch_size": 8,
"bf16": {"enabled": True}
}
)
Knowledge Sources
Relationships
Principle:Deepspeedai_DeepSpeed_AutoTP_Engine_Init
Metadata
- Workflow: AutoTP_Training
- Type: Implementation
- Last Updated: 2026-02-09 00:00 GMT