Implementation:Deepspeedai DeepSpeed AutoTP Replace

Overview

Concrete tool for automatically detecting and replacing transformer layers with tensor-parallel variants provided by the DeepSpeed library.

Implementation Type

Class (module replacement orchestrator)

Detailed Description

The AutoTP class detects transformer layers using AutoTPConfig presets or custom patterns and replaces nn.Linear modules with TP variants. set_autotp_mode(training=True) sets the global mode flag. During deepspeed.initialize(), the engine calls replace_transformer_layer() which triggers AutoTP._replace_module() to swap standard linear layers with LinearAllreduce (row-parallel) and LinearLayer (column-parallel).

The AutoTP class provides the following key methods:

__init__(module, all_reduce_linears, prefix, state_dict, linear_layer_setting, orig_layer_impl, keep_module_on_host, partition_config): Initializes the replacement orchestrator with the target module, list of row-parallel layer names, and optional partition config.
tp_parser(model): Static method that analyzes the model graph to automatically identify which linear layers should be row-parallel (AllReduce). It walks through ModuleList children, finds linear layers, and identifies row-parallel candidates based on their position relative to LayerNorm boundaries and known naming patterns (e.g., o_proj, out_proj, down_proj).
set_tensor_parallel_config(mp_size, mp_group): In training mode, retrieves the TP group and world size from deepspeed.utils.groups rather than using the passed arguments.
_replace_module(r_module, prev_name, prev_class_name): Recursively walks the model tree. When partition_config is present, it uses pattern-based routing via _replace_with_config(). Otherwise, it uses type-based routing via linear_policies dictionary.
_replace(child, name, conv_linear_layer): The legacy replacement function that routes each layer to the appropriate TP variant based on the layer name and model architecture.
_replace_with_config(child, name): The new pattern-based replacement function that uses partition_config.find_matching_spec() to determine the partition type and calls _create_row_parallel_layer() or _create_column_parallel_layer().

The set_autotp_mode() function:

Signature: def set_autotp_mode(training: bool) -> None
Sets the global DEEPSPEED_AUTOTP_MODE to AUTOTP_MODE.TRAINING or AUTOTP_MODE.INFERENCE.
This affects TP layer behavior: training mode uses non-inplace operations for autograd compatibility and even partitioning via torch.chunk().

Code Reference

Repository: https://github.com/deepspeedai/DeepSpeed
File: deepspeed/module_inject/auto_tp.py (L194-630, AutoTP class)
File: deepspeed/module_inject/layers.py (L42-50, set_autotp_mode)
AutoTP.__init__ signature: def __init__(self, module, all_reduce_linears, prefix, state_dict, linear_layer_setting, orig_layer_impl, keep_module_on_host=False, partition_config=None)
Import: from deepspeed.module_inject.layers import set_autotp_mode
Import: from deepspeed.module_inject.auto_tp import AutoTP

Parameters

AutoTP constructor:

Parameter	Type	Required	Default	Description
module	torch.nn.Module	Yes	—	The transformer module to process
all_reduce_linears	list	Yes	—	Names of linear layers requiring AllReduce (row-parallel)
prefix	str	Yes	—	Parameter name prefix for state_dict lookup
state_dict	dict	Yes	—	Model state dictionary for weight loading
linear_layer_setting	list	Yes	—	Module class types for linear layer detection
orig_layer_impl	type	Yes	—	Original transformer layer implementation class
keep_module_on_host	bool	No	False	Keep weights on CPU to reduce GPU memory
partition_config	AutoTPConfig	No	None	Custom partition configuration with TPLayerSpec rules

set_autotp_mode:

Parameter	Type	Required	Default	Description
training	bool	Yes	—	If True, sets TRAINING mode; if False, sets INFERENCE mode

I/O

Direction	Name	Type	Description
Input	module	torch.nn.Module	Model with standard nn.Linear layers
Input	TP config	autotp_size, partition_config	Tensor parallelism settings
Output	module	torch.nn.Module	Model with nn.Linear layers replaced by LinearAllreduce/LinearLayer TP variants

Usage Example

import deepspeed
import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")

# AutoTP is triggered automatically via deepspeed.initialize()
engine, _, _, _ = deepspeed.initialize(
    model=model,
    config={
        "tensor_parallel": {"autotp_size": 4},
        "train_batch_size": 8,
        "bf16": {"enabled": True}
    }
)
# model's linear layers are now TP-partitioned

# Using a preset model
engine, _, _, _ = deepspeed.initialize(
    model=model,
    config={
        "tensor_parallel": {
            "autotp_size": 4,
            "preset_model": "llama"
        },
        "train_batch_size": 8,
        "bf16": {"enabled": True}
    }
)

# Using custom partition config
engine, _, _, _ = deepspeed.initialize(
    model=model,
    config={
        "tensor_parallel": {
            "autotp_size": 4,
            "partition_config": {
                "use_default_specs": False,
                "layer_specs": [
                    {"patterns": [".*\\.o_proj\\.weight$"], "partition_type": "row"},
                    {"patterns": [".*\\.[qkv]_proj\\.weight$"], "partition_type": "column"}
                ]
            }
        },
        "train_batch_size": 8,
        "bf16": {"enabled": True}
    }
)

Knowledge Sources

Relationships

Principle:Deepspeedai_DeepSpeed_AutoTP_Engine_Init

Environment:Deepspeedai_DeepSpeed_CUDA_GPU_Environment

Metadata

Workflow: AutoTP_Training
Type: Implementation
Last Updated: 2026-02-09 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment