Implementation:Deepspeedai DeepSpeed TPTrainingConfig Init

Overview

Concrete tool for configuring automatic tensor parallelism training parameters provided by the DeepSpeed library.

Implementation Type

Class (Pydantic configuration model and dataclass)

Detailed Description

TPTrainingConfig is a Pydantic model (inheriting from DeepSpeedConfigModel) that validates AutoTP settings from the DeepSpeed JSON config's tensor_parallel section. It is instantiated by get_tensor_parallel_config(ds_config) which extracts the tensor_parallel section from the config dict and passes it to TPTrainingConfig(**ds_config['tensor_parallel']).

AutoTPConfig is a dataclass that stores layer-level sharding patterns using TPLayerSpec rules for detecting and partitioning transformer layers. It is created from TPTrainingConfig.get_partition_config_object(), which resolves presets and custom configs:

If preset_model is set, loads the preset via AutoTPPresets.get_preset().
If partition_config dict is provided, creates a custom AutoTPConfig via AutoTPConfig.from_dict().
If both exist and use_default_specs is true, merges them via merge_autotp_configs().
Sets the tp_size on the resulting config.

AutoTPPresets provides static methods returning pre-built AutoTPConfig instances for popular architectures: llama(), bloom(), chatglm(), mixtral(), deepseek_v2(), qwen2(), and phi3().

TPLayerSpec is a dataclass with a matches(param_name, model_type) method that checks if a given parameter name matches any of the spec's regex patterns, optionally filtered by model type.

Code Reference

Repository: https://github.com/deepspeedai/DeepSpeed
File: deepspeed/runtime/tensor_parallel/config.py (L38-146, TPTrainingConfig and get_tensor_parallel_config)
File: deepspeed/module_inject/autotp_config.py (L213-308, AutoTPConfig; L310-509, AutoTPPresets)
TPTrainingConfig key fields: autotp_size (int), dtype (torch.dtype), tp_overlap_comm (bool), partition_config (Optional[Dict]), preset_model (Optional[str])
AutoTPConfig key fields: tp_size (int), layer_specs (List[TPLayerSpec]), use_default_specs (bool), strict_mode (bool)
Import: from deepspeed.runtime.tensor_parallel.config import TPTrainingConfig, get_tensor_parallel_config
Import: from deepspeed.module_inject.autotp_config import AutoTPConfig, AutoTPPresets, TPLayerSpec

Parameters

TPTrainingConfig fields:

Field	Type	Default	Description
autotp_size	int	0	Tensor parallelism degree; 0 means disabled
dtype	torch.dtype	torch.float16	Target model data type for TP operations
tp_overlap_comm	bool	False	Overlap AllReduce communication with computation
tensor_parallel	TPConfig	{}	Nested TP config with tp_size, mpu, tp_group
partition_config	Optional[Dict]	None	Custom layer partitioning rules via TPLayerSpec
preset_model	Optional[str]	None	Built-in preset name (e.g., "llama", "bloom", "mixtral")
keep_module_on_host	bool	False	Keep checkpoint data on host to avoid OOM
replace_with_kernel_inject	bool	False	Enable fused kernel injection (inference)

AutoTPConfig fields:

Field	Type	Default	Description
tp_size	int	1	Tensor parallelism degree
layer_specs	List[TPLayerSpec]	[]	List of layer specification rules
embedding_partition_dim	int	1	Partition dimension for embedding layers
lm_head_patterns	List[str]	["lm_head", "embed_out"]	Patterns identifying the LM head layer
use_default_specs	bool	True	Merge custom specs with built-in defaults
strict_mode	bool	False	Raise error on unmatched linear layers

I/O

Direction	Name	Type	Description
Input	ds_config	dict	DeepSpeed JSON config with `tensor_parallel` section
Output	TPTrainingConfig	Pydantic model	Validated TP training configuration
Output	AutoTPConfig	dataclass	Layer-level sharding patterns (from get_partition_config_object)

Usage Example

# JSON config usage
config = {
    "tensor_parallel": {
        "autotp_size": 4,
        "tp_grain_size": 64,
        "preset_model": "llama"
    },
    "train_batch_size": 8,
    "bf16": {"enabled": True}
}

# Internal usage (how DeepSpeed processes the config)
from deepspeed.runtime.tensor_parallel.config import get_tensor_parallel_config
tp_config = get_tensor_parallel_config(config)
# tp_config.autotp_size == 4

# Get the AutoTPConfig with layer specs
partition_config = tp_config.get_partition_config_object()
# partition_config contains LLaMA preset specs with tp_size=4

# Custom partition config example
config_custom = {
    "tensor_parallel": {
        "autotp_size": 4,
        "partition_config": {
            "use_default_specs": False,
            "layer_specs": [
                {
                    "patterns": [".*\\.o_proj\\.weight$", ".*\\.down_proj\\.weight$"],
                    "partition_type": "row"
                },
                {
                    "patterns": [".*\\.[qkv]_proj\\.weight$"],
                    "partition_type": "column"
                }
            ]
        }
    },
    "train_batch_size": 8,
    "bf16": {"enabled": True}
}

Knowledge Sources

Relationships

Principle:Deepspeedai_DeepSpeed_AutoTP_Configuration

Metadata

Workflow: AutoTP_Training
Type: Implementation
Last Updated: 2026-02-09 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment