Implementation:Deepspeedai DeepSpeed TPTrainingConfig Init
Overview
Concrete tool for configuring automatic tensor parallelism training parameters provided by the DeepSpeed library.
Implementation Type
Class (Pydantic configuration model and dataclass)
Detailed Description
TPTrainingConfig is a Pydantic model (inheriting from DeepSpeedConfigModel) that validates AutoTP settings from the DeepSpeed JSON config's tensor_parallel section. It is instantiated by get_tensor_parallel_config(ds_config) which extracts the tensor_parallel section from the config dict and passes it to TPTrainingConfig(**ds_config['tensor_parallel']).
AutoTPConfig is a dataclass that stores layer-level sharding patterns using TPLayerSpec rules for detecting and partitioning transformer layers. It is created from TPTrainingConfig.get_partition_config_object(), which resolves presets and custom configs:
- If
preset_modelis set, loads the preset viaAutoTPPresets.get_preset(). - If
partition_configdict is provided, creates a customAutoTPConfigviaAutoTPConfig.from_dict(). - If both exist and
use_default_specsis true, merges them viamerge_autotp_configs(). - Sets the
tp_sizeon the resulting config.
AutoTPPresets provides static methods returning pre-built AutoTPConfig instances for popular architectures: llama(), bloom(), chatglm(), mixtral(), deepseek_v2(), qwen2(), and phi3().
TPLayerSpec is a dataclass with a matches(param_name, model_type) method that checks if a given parameter name matches any of the spec's regex patterns, optionally filtered by model type.
Code Reference
- Repository: https://github.com/deepspeedai/DeepSpeed
- File:
deepspeed/runtime/tensor_parallel/config.py(L38-146, TPTrainingConfig and get_tensor_parallel_config) - File:
deepspeed/module_inject/autotp_config.py(L213-308, AutoTPConfig; L310-509, AutoTPPresets) - TPTrainingConfig key fields:
autotp_size(int),dtype(torch.dtype),tp_overlap_comm(bool),partition_config(Optional[Dict]),preset_model(Optional[str]) - AutoTPConfig key fields:
tp_size(int),layer_specs(List[TPLayerSpec]),use_default_specs(bool),strict_mode(bool) - Import:
from deepspeed.runtime.tensor_parallel.config import TPTrainingConfig, get_tensor_parallel_config - Import:
from deepspeed.module_inject.autotp_config import AutoTPConfig, AutoTPPresets, TPLayerSpec
Parameters
TPTrainingConfig fields:
| Field | Type | Default | Description |
|---|---|---|---|
| autotp_size | int | 0 | Tensor parallelism degree; 0 means disabled |
| dtype | torch.dtype | torch.float16 | Target model data type for TP operations |
| tp_overlap_comm | bool | False | Overlap AllReduce communication with computation |
| tensor_parallel | TPConfig | {} | Nested TP config with tp_size, mpu, tp_group |
| partition_config | Optional[Dict] | None | Custom layer partitioning rules via TPLayerSpec |
| preset_model | Optional[str] | None | Built-in preset name (e.g., "llama", "bloom", "mixtral") |
| keep_module_on_host | bool | False | Keep checkpoint data on host to avoid OOM |
| replace_with_kernel_inject | bool | False | Enable fused kernel injection (inference) |
AutoTPConfig fields:
| Field | Type | Default | Description |
|---|---|---|---|
| tp_size | int | 1 | Tensor parallelism degree |
| layer_specs | List[TPLayerSpec] | [] | List of layer specification rules |
| embedding_partition_dim | int | 1 | Partition dimension for embedding layers |
| lm_head_patterns | List[str] | ["lm_head", "embed_out"] | Patterns identifying the LM head layer |
| use_default_specs | bool | True | Merge custom specs with built-in defaults |
| strict_mode | bool | False | Raise error on unmatched linear layers |
I/O
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | ds_config | dict | DeepSpeed JSON config with tensor_parallel section
|
| Output | TPTrainingConfig | Pydantic model | Validated TP training configuration |
| Output | AutoTPConfig | dataclass | Layer-level sharding patterns (from get_partition_config_object) |
Usage Example
# JSON config usage
config = {
"tensor_parallel": {
"autotp_size": 4,
"tp_grain_size": 64,
"preset_model": "llama"
},
"train_batch_size": 8,
"bf16": {"enabled": True}
}
# Internal usage (how DeepSpeed processes the config)
from deepspeed.runtime.tensor_parallel.config import get_tensor_parallel_config
tp_config = get_tensor_parallel_config(config)
# tp_config.autotp_size == 4
# Get the AutoTPConfig with layer specs
partition_config = tp_config.get_partition_config_object()
# partition_config contains LLaMA preset specs with tp_size=4
# Custom partition config example
config_custom = {
"tensor_parallel": {
"autotp_size": 4,
"partition_config": {
"use_default_specs": False,
"layer_specs": [
{
"patterns": [".*\\.o_proj\\.weight$", ".*\\.down_proj\\.weight$"],
"partition_type": "row"
},
{
"patterns": [".*\\.[qkv]_proj\\.weight$"],
"partition_type": "column"
}
]
}
},
"train_batch_size": 8,
"bf16": {"enabled": True}
}
Knowledge Sources
Relationships
Principle:Deepspeedai_DeepSpeed_AutoTP_Configuration
Metadata
- Workflow: AutoTP_Training
- Type: Implementation
- Last Updated: 2026-02-09 00:00 GMT