Implementation:Deepspeedai DeepSpeed DeepSpeedConfig Init

Knowledge Sources	DeepSpeed DeepSpeed Configuration
Domains	Distributed_Training, Configuration_Management, Memory_Optimization
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for parsing and validating DeepSpeed JSON configuration provided by the DeepSpeed library.

Description

The DeepSpeedConfig class parses a JSON file path, dictionary, or base64-encoded string into a validated configuration object. It resolves world size considering model parallelism and sequence parallelism, handles elasticity configuration, and exposes all training parameters as attributes. Key responsibilities include:

Config parsing: Accepts a file path (JSON/HJSON), Python dictionary, or base64-encoded string
World size resolution: Accounts for model parallelism (mpu), sequence parallelism (mesh_device), and standard data parallelism
Elasticity handling: Computes elastic batch size configuration for dynamic GPU counts
Batch size computation: Derives train_batch_size, micro_batch_per_gpu, and gradient_accumulation_steps from the config and world size
Parameter validation: Ensures all required fields are present and consistent

Usage

Typically instantiated internally by deepspeed.initialize(). Can also be instantiated directly for config inspection or validation before training begins.

Code Reference

Source Location

Repository: DeepSpeed
File: deepspeed/runtime/config.py
Lines: 651-710

Signature

class DeepSpeedConfig(object):

    def __init__(self, config: Union[str, dict], mpu=None, mesh_device=None):
        super(DeepSpeedConfig, self).__init__()
        if isinstance(config, dict):
            self._param_dict = config
        elif os.path.exists(config):
            self._param_dict = hjson.load(
                open(config, "r"),
                object_pairs_hook=dict_raise_error_on_duplicate_keys
            )
        else:
            try:
                config_decoded = base64.urlsafe_b64decode(config).decode('utf-8')
                self._param_dict = hjson.loads(config_decoded)
            except (UnicodeDecodeError, AttributeError):
                raise ValueError(
                    f"Expected a string path to an existing deepspeed config, "
                    f"or a dictionary or a valid base64. Received: {config}"
                )
        # ... world size resolution, elasticity, batch size computation

Import

from deepspeed.runtime.config import DeepSpeedConfig

I/O Contract

Inputs

Name	Type	Required	Description
config	Union[str, dict]	Yes	JSON file path, Python dictionary, or base64-encoded configuration string
mpu	object	No	Model parallelism unit providing get_data_parallel_world_size() or get_sequence_parallel_world_size()
mesh_device	object	No	Device mesh for sequence parallelism, providing get_group(mesh_dim="data_parallel")

Outputs

Name	Type	Description
DeepSpeedConfig	object	Parsed configuration object with attributes: zero_optimization_stage, optimizer_name, fp16_enabled, bf16_enabled, gradient_accumulation_steps, train_batch_size, train_micro_batch_size_per_gpu, world_size, global_rank, and many more

Usage Examples

from deepspeed.runtime.config import DeepSpeedConfig

# From a JSON file path
config = DeepSpeedConfig("ds_config.json")

# From a Python dictionary
config = DeepSpeedConfig({
    "zero_optimization": {"stage": 2},
    "fp16": {"enabled": True, "loss_scale": 0, "initial_scale_power": 16},
    "optimizer": {
        "type": "Adam",
        "params": {"lr": 1e-4, "betas": [0.9, 0.999], "eps": 1e-8}
    },
    "train_batch_size": 32,
    "gradient_accumulation_steps": 4,
    "train_micro_batch_size_per_gpu": 2,
})

# Access parsed attributes
print(config.zero_optimization_stage)        # 2
print(config.fp16_enabled)                   # True
print(config.train_batch_size)               # 32
print(config.gradient_accumulation_steps)    # 4

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment