Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Deepspeedai DeepSpeed TPTrainingConfig Init

From Leeroopedia
Revision as of 14:47, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Deepspeedai_DeepSpeed_TPTrainingConfig_Init.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Overview

Concrete tool for configuring automatic tensor parallelism training parameters provided by the DeepSpeed library.

Implementation Type

Class (Pydantic configuration model and dataclass)

Detailed Description

TPTrainingConfig is a Pydantic model (inheriting from DeepSpeedConfigModel) that validates AutoTP settings from the DeepSpeed JSON config's tensor_parallel section. It is instantiated by get_tensor_parallel_config(ds_config) which extracts the tensor_parallel section from the config dict and passes it to TPTrainingConfig(**ds_config['tensor_parallel']).

AutoTPConfig is a dataclass that stores layer-level sharding patterns using TPLayerSpec rules for detecting and partitioning transformer layers. It is created from TPTrainingConfig.get_partition_config_object(), which resolves presets and custom configs:

  1. If preset_model is set, loads the preset via AutoTPPresets.get_preset().
  2. If partition_config dict is provided, creates a custom AutoTPConfig via AutoTPConfig.from_dict().
  3. If both exist and use_default_specs is true, merges them via merge_autotp_configs().
  4. Sets the tp_size on the resulting config.

AutoTPPresets provides static methods returning pre-built AutoTPConfig instances for popular architectures: llama(), bloom(), chatglm(), mixtral(), deepseek_v2(), qwen2(), and phi3().

TPLayerSpec is a dataclass with a matches(param_name, model_type) method that checks if a given parameter name matches any of the spec's regex patterns, optionally filtered by model type.

Code Reference

  • Repository: https://github.com/deepspeedai/DeepSpeed
  • File: deepspeed/runtime/tensor_parallel/config.py (L38-146, TPTrainingConfig and get_tensor_parallel_config)
  • File: deepspeed/module_inject/autotp_config.py (L213-308, AutoTPConfig; L310-509, AutoTPPresets)
  • TPTrainingConfig key fields: autotp_size (int), dtype (torch.dtype), tp_overlap_comm (bool), partition_config (Optional[Dict]), preset_model (Optional[str])
  • AutoTPConfig key fields: tp_size (int), layer_specs (List[TPLayerSpec]), use_default_specs (bool), strict_mode (bool)
  • Import: from deepspeed.runtime.tensor_parallel.config import TPTrainingConfig, get_tensor_parallel_config
  • Import: from deepspeed.module_inject.autotp_config import AutoTPConfig, AutoTPPresets, TPLayerSpec

Parameters

TPTrainingConfig fields:

Field Type Default Description
autotp_size int 0 Tensor parallelism degree; 0 means disabled
dtype torch.dtype torch.float16 Target model data type for TP operations
tp_overlap_comm bool False Overlap AllReduce communication with computation
tensor_parallel TPConfig {} Nested TP config with tp_size, mpu, tp_group
partition_config Optional[Dict] None Custom layer partitioning rules via TPLayerSpec
preset_model Optional[str] None Built-in preset name (e.g., "llama", "bloom", "mixtral")
keep_module_on_host bool False Keep checkpoint data on host to avoid OOM
replace_with_kernel_inject bool False Enable fused kernel injection (inference)

AutoTPConfig fields:

Field Type Default Description
tp_size int 1 Tensor parallelism degree
layer_specs List[TPLayerSpec] [] List of layer specification rules
embedding_partition_dim int 1 Partition dimension for embedding layers
lm_head_patterns List[str] ["lm_head", "embed_out"] Patterns identifying the LM head layer
use_default_specs bool True Merge custom specs with built-in defaults
strict_mode bool False Raise error on unmatched linear layers

I/O

Direction Name Type Description
Input ds_config dict DeepSpeed JSON config with tensor_parallel section
Output TPTrainingConfig Pydantic model Validated TP training configuration
Output AutoTPConfig dataclass Layer-level sharding patterns (from get_partition_config_object)

Usage Example

# JSON config usage
config = {
    "tensor_parallel": {
        "autotp_size": 4,
        "tp_grain_size": 64,
        "preset_model": "llama"
    },
    "train_batch_size": 8,
    "bf16": {"enabled": True}
}

# Internal usage (how DeepSpeed processes the config)
from deepspeed.runtime.tensor_parallel.config import get_tensor_parallel_config
tp_config = get_tensor_parallel_config(config)
# tp_config.autotp_size == 4

# Get the AutoTPConfig with layer specs
partition_config = tp_config.get_partition_config_object()
# partition_config contains LLaMA preset specs with tp_size=4

# Custom partition config example
config_custom = {
    "tensor_parallel": {
        "autotp_size": 4,
        "partition_config": {
            "use_default_specs": False,
            "layer_specs": [
                {
                    "patterns": [".*\\.o_proj\\.weight$", ".*\\.down_proj\\.weight$"],
                    "partition_type": "row"
                },
                {
                    "patterns": [".*\\.[qkv]_proj\\.weight$"],
                    "partition_type": "column"
                }
            ]
        }
    },
    "train_batch_size": 8,
    "bf16": {"enabled": True}
}

Knowledge Sources

Relationships

Principle:Deepspeedai_DeepSpeed_AutoTP_Configuration

Metadata

  • Workflow: AutoTP_Training
  • Type: Implementation
  • Last Updated: 2026-02-09 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment