Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Deepspeedai DeepSpeed TPTrainingConfig Init

From Leeroopedia


Overview

Concrete tool for configuring automatic tensor parallelism training parameters provided by the DeepSpeed library.

Implementation Type

Class (Pydantic configuration model and dataclass)

Detailed Description

TPTrainingConfig is a Pydantic model (inheriting from DeepSpeedConfigModel) that validates AutoTP settings from the DeepSpeed JSON config's tensor_parallel section. It is instantiated by get_tensor_parallel_config(ds_config) which extracts the tensor_parallel section from the config dict and passes it to TPTrainingConfig(**ds_config['tensor_parallel']).

AutoTPConfig is a dataclass that stores layer-level sharding patterns using TPLayerSpec rules for detecting and partitioning transformer layers. It is created from TPTrainingConfig.get_partition_config_object(), which resolves presets and custom configs:

  1. If preset_model is set, loads the preset via AutoTPPresets.get_preset().
  2. If partition_config dict is provided, creates a custom AutoTPConfig via AutoTPConfig.from_dict().
  3. If both exist and use_default_specs is true, merges them via merge_autotp_configs().
  4. Sets the tp_size on the resulting config.

AutoTPPresets provides static methods returning pre-built AutoTPConfig instances for popular architectures: llama(), bloom(), chatglm(), mixtral(), deepseek_v2(), qwen2(), and phi3().

TPLayerSpec is a dataclass with a matches(param_name, model_type) method that checks if a given parameter name matches any of the spec's regex patterns, optionally filtered by model type.

Code Reference

  • Repository: https://github.com/deepspeedai/DeepSpeed
  • File: deepspeed/runtime/tensor_parallel/config.py (L38-146, TPTrainingConfig and get_tensor_parallel_config)
  • File: deepspeed/module_inject/autotp_config.py (L213-308, AutoTPConfig; L310-509, AutoTPPresets)
  • TPTrainingConfig key fields: autotp_size (int), dtype (torch.dtype), tp_overlap_comm (bool), partition_config (Optional[Dict]), preset_model (Optional[str])
  • AutoTPConfig key fields: tp_size (int), layer_specs (List[TPLayerSpec]), use_default_specs (bool), strict_mode (bool)
  • Import: from deepspeed.runtime.tensor_parallel.config import TPTrainingConfig, get_tensor_parallel_config
  • Import: from deepspeed.module_inject.autotp_config import AutoTPConfig, AutoTPPresets, TPLayerSpec

Parameters

TPTrainingConfig fields:

Field Type Default Description
autotp_size int 0 Tensor parallelism degree; 0 means disabled
dtype torch.dtype torch.float16 Target model data type for TP operations
tp_overlap_comm bool False Overlap AllReduce communication with computation
tensor_parallel TPConfig {} Nested TP config with tp_size, mpu, tp_group
partition_config Optional[Dict] None Custom layer partitioning rules via TPLayerSpec
preset_model Optional[str] None Built-in preset name (e.g., "llama", "bloom", "mixtral")
keep_module_on_host bool False Keep checkpoint data on host to avoid OOM
replace_with_kernel_inject bool False Enable fused kernel injection (inference)

AutoTPConfig fields:

Field Type Default Description
tp_size int 1 Tensor parallelism degree
layer_specs List[TPLayerSpec] [] List of layer specification rules
embedding_partition_dim int 1 Partition dimension for embedding layers
lm_head_patterns List[str] ["lm_head", "embed_out"] Patterns identifying the LM head layer
use_default_specs bool True Merge custom specs with built-in defaults
strict_mode bool False Raise error on unmatched linear layers

I/O

Direction Name Type Description
Input ds_config dict DeepSpeed JSON config with tensor_parallel section
Output TPTrainingConfig Pydantic model Validated TP training configuration
Output AutoTPConfig dataclass Layer-level sharding patterns (from get_partition_config_object)

Usage Example

# JSON config usage
config = {
    "tensor_parallel": {
        "autotp_size": 4,
        "tp_grain_size": 64,
        "preset_model": "llama"
    },
    "train_batch_size": 8,
    "bf16": {"enabled": True}
}

# Internal usage (how DeepSpeed processes the config)
from deepspeed.runtime.tensor_parallel.config import get_tensor_parallel_config
tp_config = get_tensor_parallel_config(config)
# tp_config.autotp_size == 4

# Get the AutoTPConfig with layer specs
partition_config = tp_config.get_partition_config_object()
# partition_config contains LLaMA preset specs with tp_size=4

# Custom partition config example
config_custom = {
    "tensor_parallel": {
        "autotp_size": 4,
        "partition_config": {
            "use_default_specs": False,
            "layer_specs": [
                {
                    "patterns": [".*\\.o_proj\\.weight$", ".*\\.down_proj\\.weight$"],
                    "partition_type": "row"
                },
                {
                    "patterns": [".*\\.[qkv]_proj\\.weight$"],
                    "partition_type": "column"
                }
            ]
        }
    },
    "train_batch_size": 8,
    "bf16": {"enabled": True}
}

Knowledge Sources

Relationships

Principle:Deepspeedai_DeepSpeed_AutoTP_Configuration

Metadata

  • Workflow: AutoTP_Training
  • Type: Implementation
  • Last Updated: 2026-02-09 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment