Principle:Microsoft DeepSpeedExamples DeepSpeed CLI Integration

Metadata

Field	Value
Page Type	Principle
Repository	Microsoft/DeepSpeedExamples
Title	DeepSpeed_CLI_Integration
Sources	Doc: DeepSpeed Getting Started
Domains	Infrastructure, Configuration
Related Implementation	Implementation:Microsoft_DeepSpeedExamples_Add_Argument_CIFAR

Overview

A configuration pattern for integrating DeepSpeed command-line arguments into existing PyTorch training scripts.

Description

Migrating from standard PyTorch to DeepSpeed requires establishing a configuration bridge between user-specified command-line arguments and the DeepSpeed runtime. This principle covers the argument integration pattern that enables a single training script to support both standard PyTorch training and DeepSpeed-enhanced distributed training with mixed precision, ZeRO optimization, and Mixture of Experts.

The integration involves three layers:

Layer 1: Custom Application Arguments

The training script defines its own domain-specific arguments using Python's argparse.ArgumentParser. These control experiment-level settings such as:

Number of training epochs
Data type selection (fp16, bf16, fp32)
ZeRO optimization stage (0, 1, 2, 3)
MoE configuration (number of experts, top-k routing, expert parallelism)

Layer 2: DeepSpeed Config Arguments

deepspeed.add_config_arguments(parser) injects DeepSpeed-specific arguments into the existing parser. These include:

--deepspeed -- Enable DeepSpeed engine
--deepspeed_config -- Path to JSON configuration file
--local_rank -- Local rank for distributed training (set by the launcher)

Layer 3: JSON Configuration

DeepSpeed uses a JSON configuration dictionary (or file) that defines the runtime behavior. This is separate from CLI arguments but works in conjunction. The JSON config specifies:

Optimizer type and hyperparameters
Learning rate scheduler
Mixed precision settings (fp16/bf16)
ZeRO optimization stages and parameters
Gradient clipping and accumulation
Batch size and micro-batch size

Theoretical Basis

Configuration Layering Pattern

DeepSpeed employs a layered configuration pattern where:

CLI Arguments (highest priority)
    |
    v
JSON Config File / Dictionary
    |
    v
DeepSpeed Defaults (lowest priority)

CLI arguments can override JSON config values. The JSON config provides the bulk of the runtime configuration, while CLI arguments handle deployment-specific settings (like local rank) and user-facing toggles (like enabling MoE).

Argument Namespacing

The deepspeed.add_config_arguments() function adds arguments to the same parser namespace, allowing the combined argument namespace to be passed directly to deepspeed.initialize(args=args, ...). This design avoids the need for separate configuration objects and keeps the integration surface minimal.

Distributed Launcher Integration

The --local_rank argument is critical for distributed training. The DeepSpeed launcher (deepspeed CLI command) automatically sets this argument for each process it spawns:

deepspeed launcher
    |
    +-- Process 0: --local_rank=0
    +-- Process 1: --local_rank=1
    +-- Process N: --local_rank=N

Configuration Categories

Category	CLI Arguments	Purpose
Training	`--epochs`, `--log-interval`	Control training duration and logging frequency
Precision	`--dtype {fp16,bf16,fp32}`	Select mixed precision data type
ZeRO	`--stage {0,1,2,3}`	Select ZeRO optimization stage
MoE	`--moe`, `--num-experts`, `--top-k`, `--ep-world-size`, `--min-capacity`, `--noisy-gate-policy`, `--mlp-type`, `--moe-param-group`	Configure Mixture of Experts
DeepSpeed	`--deepspeed`, `--deepspeed_config`, `--local_rank`	Core DeepSpeed runtime settings

Pattern: CLI-to-Config Mapping

The CLI arguments flow into the JSON configuration dictionary at initialization time. For example, in the CIFAR-10 example:

# CLI arg --dtype controls JSON config keys
ds_config["bf16"]["enabled"] = (args.dtype == "bf16")
ds_config["fp16"]["enabled"] = (args.dtype == "fp16")

# CLI arg --stage controls ZeRO config
ds_config["zero_optimization"]["stage"] = args.stage

This mapping pattern allows the same JSON config template to be reused across different runs by parameterizing key settings through CLI arguments.

Usage Patterns

Standard DeepSpeed Run

# Basic DeepSpeed launch (uses all available GPUs)
deepspeed cifar10_deepspeed.py --deepspeed

# With specific dtype and ZeRO stage
deepspeed cifar10_deepspeed.py --deepspeed --dtype bf16 --stage 2

# Via the provided shell script
bash run_ds.sh --dtype=fp16 --stage=0

MoE Run

# DeepSpeed with Mixture of Experts
deepspeed --num_nodes=1 --num_gpus=2 cifar10_deepspeed.py \
    --deepspeed \
    --moe \
    --ep-world-size 2 \
    --num-experts 2 \
    --top-k 1 \
    --noisy-gate-policy 'RSample' \
    --moe-param-group

Related Pages

Implementation:Microsoft_DeepSpeedExamples_Add_Argument_CIFAR -- Concrete argument parser implementation
Principle:Microsoft_DeepSpeedExamples_DeepSpeed_Engine_Init -- Engine initialization that consumes these arguments
Principle:Microsoft_DeepSpeedExamples_Baseline_PyTorch_Training -- Baseline pattern before CLI integration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment