Principle:Microsoft DeepSpeedExamples DeepSpeed CLI Integration
Metadata
| Field | Value |
|---|---|
| Page Type | Principle |
| Repository | Microsoft/DeepSpeedExamples |
| Title | DeepSpeed_CLI_Integration |
| Sources | Doc: DeepSpeed Getting Started |
| Domains | Infrastructure, Configuration |
| Related Implementation | Implementation:Microsoft_DeepSpeedExamples_Add_Argument_CIFAR |
Overview
A configuration pattern for integrating DeepSpeed command-line arguments into existing PyTorch training scripts.
Description
Migrating from standard PyTorch to DeepSpeed requires establishing a configuration bridge between user-specified command-line arguments and the DeepSpeed runtime. This principle covers the argument integration pattern that enables a single training script to support both standard PyTorch training and DeepSpeed-enhanced distributed training with mixed precision, ZeRO optimization, and Mixture of Experts.
The integration involves three layers:
Layer 1: Custom Application Arguments
The training script defines its own domain-specific arguments using Python's argparse.ArgumentParser. These control experiment-level settings such as:
- Number of training epochs
- Data type selection (fp16, bf16, fp32)
- ZeRO optimization stage (0, 1, 2, 3)
- MoE configuration (number of experts, top-k routing, expert parallelism)
Layer 2: DeepSpeed Config Arguments
deepspeed.add_config_arguments(parser) injects DeepSpeed-specific arguments into the existing parser. These include:
--deepspeed-- Enable DeepSpeed engine--deepspeed_config-- Path to JSON configuration file--local_rank-- Local rank for distributed training (set by the launcher)
Layer 3: JSON Configuration
DeepSpeed uses a JSON configuration dictionary (or file) that defines the runtime behavior. This is separate from CLI arguments but works in conjunction. The JSON config specifies:
- Optimizer type and hyperparameters
- Learning rate scheduler
- Mixed precision settings (fp16/bf16)
- ZeRO optimization stages and parameters
- Gradient clipping and accumulation
- Batch size and micro-batch size
Theoretical Basis
Configuration Layering Pattern
DeepSpeed employs a layered configuration pattern where:
CLI Arguments (highest priority)
|
v
JSON Config File / Dictionary
|
v
DeepSpeed Defaults (lowest priority)
CLI arguments can override JSON config values. The JSON config provides the bulk of the runtime configuration, while CLI arguments handle deployment-specific settings (like local rank) and user-facing toggles (like enabling MoE).
Argument Namespacing
The deepspeed.add_config_arguments() function adds arguments to the same parser namespace, allowing the combined argument namespace to be passed directly to deepspeed.initialize(args=args, ...). This design avoids the need for separate configuration objects and keeps the integration surface minimal.
Distributed Launcher Integration
The --local_rank argument is critical for distributed training. The DeepSpeed launcher (deepspeed CLI command) automatically sets this argument for each process it spawns:
deepspeed launcher
|
+-- Process 0: --local_rank=0
+-- Process 1: --local_rank=1
+-- Process N: --local_rank=N
Configuration Categories
| Category | CLI Arguments | Purpose |
|---|---|---|
| Training | --epochs, --log-interval |
Control training duration and logging frequency |
| Precision | --dtype {fp16,bf16,fp32} |
Select mixed precision data type |
| ZeRO | --stage {0,1,2,3} |
Select ZeRO optimization stage |
| MoE | --moe, --num-experts, --top-k, --ep-world-size, --min-capacity, --noisy-gate-policy, --mlp-type, --moe-param-group |
Configure Mixture of Experts |
| DeepSpeed | --deepspeed, --deepspeed_config, --local_rank |
Core DeepSpeed runtime settings |
Pattern: CLI-to-Config Mapping
The CLI arguments flow into the JSON configuration dictionary at initialization time. For example, in the CIFAR-10 example:
# CLI arg --dtype controls JSON config keys
ds_config["bf16"]["enabled"] = (args.dtype == "bf16")
ds_config["fp16"]["enabled"] = (args.dtype == "fp16")
# CLI arg --stage controls ZeRO config
ds_config["zero_optimization"]["stage"] = args.stage
This mapping pattern allows the same JSON config template to be reused across different runs by parameterizing key settings through CLI arguments.
Usage Patterns
Standard DeepSpeed Run
# Basic DeepSpeed launch (uses all available GPUs)
deepspeed cifar10_deepspeed.py --deepspeed
# With specific dtype and ZeRO stage
deepspeed cifar10_deepspeed.py --deepspeed --dtype bf16 --stage 2
# Via the provided shell script
bash run_ds.sh --dtype=fp16 --stage=0
MoE Run
# DeepSpeed with Mixture of Experts
deepspeed --num_nodes=1 --num_gpus=2 cifar10_deepspeed.py \
--deepspeed \
--moe \
--ep-world-size 2 \
--num-experts 2 \
--top-k 1 \
--noisy-gate-policy 'RSample' \
--moe-param-group
Related Pages
- Implementation:Microsoft_DeepSpeedExamples_Add_Argument_CIFAR -- Concrete argument parser implementation
- Principle:Microsoft_DeepSpeedExamples_DeepSpeed_Engine_Init -- Engine initialization that consumes these arguments
- Principle:Microsoft_DeepSpeedExamples_Baseline_PyTorch_Training -- Baseline pattern before CLI integration