Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepspeedai DeepSpeed AutoTP Replace

From Leeroopedia


Overview

Concrete tool for automatically detecting and replacing transformer layers with tensor-parallel variants provided by the DeepSpeed library.

Implementation Type

Class (module replacement orchestrator)

Detailed Description

The AutoTP class detects transformer layers using AutoTPConfig presets or custom patterns and replaces nn.Linear modules with TP variants. set_autotp_mode(training=True) sets the global mode flag. During deepspeed.initialize(), the engine calls replace_transformer_layer() which triggers AutoTP._replace_module() to swap standard linear layers with LinearAllreduce (row-parallel) and LinearLayer (column-parallel).

The AutoTP class provides the following key methods:

  • __init__(module, all_reduce_linears, prefix, state_dict, linear_layer_setting, orig_layer_impl, keep_module_on_host, partition_config): Initializes the replacement orchestrator with the target module, list of row-parallel layer names, and optional partition config.
  • tp_parser(model): Static method that analyzes the model graph to automatically identify which linear layers should be row-parallel (AllReduce). It walks through ModuleList children, finds linear layers, and identifies row-parallel candidates based on their position relative to LayerNorm boundaries and known naming patterns (e.g., o_proj, out_proj, down_proj).
  • set_tensor_parallel_config(mp_size, mp_group): In training mode, retrieves the TP group and world size from deepspeed.utils.groups rather than using the passed arguments.
  • _replace_module(r_module, prev_name, prev_class_name): Recursively walks the model tree. When partition_config is present, it uses pattern-based routing via _replace_with_config(). Otherwise, it uses type-based routing via linear_policies dictionary.
  • _replace(child, name, conv_linear_layer): The legacy replacement function that routes each layer to the appropriate TP variant based on the layer name and model architecture.
  • _replace_with_config(child, name): The new pattern-based replacement function that uses partition_config.find_matching_spec() to determine the partition type and calls _create_row_parallel_layer() or _create_column_parallel_layer().

The set_autotp_mode() function:

  • Signature: def set_autotp_mode(training: bool) -> None
  • Sets the global DEEPSPEED_AUTOTP_MODE to AUTOTP_MODE.TRAINING or AUTOTP_MODE.INFERENCE.
  • This affects TP layer behavior: training mode uses non-inplace operations for autograd compatibility and even partitioning via torch.chunk().

Code Reference

  • Repository: https://github.com/deepspeedai/DeepSpeed
  • File: deepspeed/module_inject/auto_tp.py (L194-630, AutoTP class)
  • File: deepspeed/module_inject/layers.py (L42-50, set_autotp_mode)
  • AutoTP.__init__ signature: def __init__(self, module, all_reduce_linears, prefix, state_dict, linear_layer_setting, orig_layer_impl, keep_module_on_host=False, partition_config=None)
  • Import: from deepspeed.module_inject.layers import set_autotp_mode
  • Import: from deepspeed.module_inject.auto_tp import AutoTP

Parameters

AutoTP constructor:

Parameter Type Required Default Description
module torch.nn.Module Yes The transformer module to process
all_reduce_linears list Yes Names of linear layers requiring AllReduce (row-parallel)
prefix str Yes Parameter name prefix for state_dict lookup
state_dict dict Yes Model state dictionary for weight loading
linear_layer_setting list Yes Module class types for linear layer detection
orig_layer_impl type Yes Original transformer layer implementation class
keep_module_on_host bool No False Keep weights on CPU to reduce GPU memory
partition_config AutoTPConfig No None Custom partition configuration with TPLayerSpec rules

set_autotp_mode:

Parameter Type Required Default Description
training bool Yes If True, sets TRAINING mode; if False, sets INFERENCE mode

I/O

Direction Name Type Description
Input module torch.nn.Module Model with standard nn.Linear layers
Input TP config autotp_size, partition_config Tensor parallelism settings
Output module torch.nn.Module Model with nn.Linear layers replaced by LinearAllreduce/LinearLayer TP variants

Usage Example

import deepspeed
import torch
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")

# AutoTP is triggered automatically via deepspeed.initialize()
engine, _, _, _ = deepspeed.initialize(
    model=model,
    config={
        "tensor_parallel": {"autotp_size": 4},
        "train_batch_size": 8,
        "bf16": {"enabled": True}
    }
)
# model's linear layers are now TP-partitioned

# Using a preset model
engine, _, _, _ = deepspeed.initialize(
    model=model,
    config={
        "tensor_parallel": {
            "autotp_size": 4,
            "preset_model": "llama"
        },
        "train_batch_size": 8,
        "bf16": {"enabled": True}
    }
)

# Using custom partition config
engine, _, _, _ = deepspeed.initialize(
    model=model,
    config={
        "tensor_parallel": {
            "autotp_size": 4,
            "partition_config": {
                "use_default_specs": False,
                "layer_specs": [
                    {"patterns": [".*\\.o_proj\\.weight$"], "partition_type": "row"},
                    {"patterns": [".*\\.[qkv]_proj\\.weight$"], "partition_type": "column"}
                ]
            }
        },
        "train_batch_size": 8,
        "bf16": {"enabled": True}
    }
)

Knowledge Sources

Relationships

Principle:Deepspeedai_DeepSpeed_AutoTP_Engine_Init

Metadata

  • Workflow: AutoTP_Training
  • Type: Implementation
  • Last Updated: 2026-02-09 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment