Implementation:Deepspeedai DeepSpeed DeepSpeedConfig Init
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Training, Configuration_Management, Memory_Optimization |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for parsing and validating DeepSpeed JSON configuration provided by the DeepSpeed library.
Description
The DeepSpeedConfig class parses a JSON file path, dictionary, or base64-encoded string into a validated configuration object. It resolves world size considering model parallelism and sequence parallelism, handles elasticity configuration, and exposes all training parameters as attributes. Key responsibilities include:
- Config parsing: Accepts a file path (JSON/HJSON), Python dictionary, or base64-encoded string
- World size resolution: Accounts for model parallelism (mpu), sequence parallelism (mesh_device), and standard data parallelism
- Elasticity handling: Computes elastic batch size configuration for dynamic GPU counts
- Batch size computation: Derives train_batch_size, micro_batch_per_gpu, and gradient_accumulation_steps from the config and world size
- Parameter validation: Ensures all required fields are present and consistent
Usage
Typically instantiated internally by deepspeed.initialize(). Can also be instantiated directly for config inspection or validation before training begins.
Code Reference
Source Location
- Repository: DeepSpeed
- File: deepspeed/runtime/config.py
- Lines: 651-710
Signature
class DeepSpeedConfig(object):
def __init__(self, config: Union[str, dict], mpu=None, mesh_device=None):
super(DeepSpeedConfig, self).__init__()
if isinstance(config, dict):
self._param_dict = config
elif os.path.exists(config):
self._param_dict = hjson.load(
open(config, "r"),
object_pairs_hook=dict_raise_error_on_duplicate_keys
)
else:
try:
config_decoded = base64.urlsafe_b64decode(config).decode('utf-8')
self._param_dict = hjson.loads(config_decoded)
except (UnicodeDecodeError, AttributeError):
raise ValueError(
f"Expected a string path to an existing deepspeed config, "
f"or a dictionary or a valid base64. Received: {config}"
)
# ... world size resolution, elasticity, batch size computation
Import
from deepspeed.runtime.config import DeepSpeedConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | Union[str, dict] | Yes | JSON file path, Python dictionary, or base64-encoded configuration string |
| mpu | object | No | Model parallelism unit providing get_data_parallel_world_size() or get_sequence_parallel_world_size() |
| mesh_device | object | No | Device mesh for sequence parallelism, providing get_group(mesh_dim="data_parallel") |
Outputs
| Name | Type | Description |
|---|---|---|
| DeepSpeedConfig | object | Parsed configuration object with attributes: zero_optimization_stage, optimizer_name, fp16_enabled, bf16_enabled, gradient_accumulation_steps, train_batch_size, train_micro_batch_size_per_gpu, world_size, global_rank, and many more |
Usage Examples
from deepspeed.runtime.config import DeepSpeedConfig
# From a JSON file path
config = DeepSpeedConfig("ds_config.json")
# From a Python dictionary
config = DeepSpeedConfig({
"zero_optimization": {"stage": 2},
"fp16": {"enabled": True, "loss_scale": 0, "initial_scale_power": 16},
"optimizer": {
"type": "Adam",
"params": {"lr": 1e-4, "betas": [0.9, 0.999], "eps": 1e-8}
},
"train_batch_size": 32,
"gradient_accumulation_steps": 4,
"train_micro_batch_size_per_gpu": 2,
})
# Access parsed attributes
print(config.zero_optimization_stage) # 2
print(config.fp16_enabled) # True
print(config.train_batch_size) # 32
print(config.gradient_accumulation_steps) # 4
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment