Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Microsoft DeepSpeedExamples DeepSpeed CLI Integration

From Leeroopedia


Metadata

Field Value
Page Type Principle
Repository Microsoft/DeepSpeedExamples
Title DeepSpeed_CLI_Integration
Sources Doc: DeepSpeed Getting Started
Domains Infrastructure, Configuration
Related Implementation Implementation:Microsoft_DeepSpeedExamples_Add_Argument_CIFAR

Overview

A configuration pattern for integrating DeepSpeed command-line arguments into existing PyTorch training scripts.

Description

Migrating from standard PyTorch to DeepSpeed requires establishing a configuration bridge between user-specified command-line arguments and the DeepSpeed runtime. This principle covers the argument integration pattern that enables a single training script to support both standard PyTorch training and DeepSpeed-enhanced distributed training with mixed precision, ZeRO optimization, and Mixture of Experts.

The integration involves three layers:

Layer 1: Custom Application Arguments

The training script defines its own domain-specific arguments using Python's argparse.ArgumentParser. These control experiment-level settings such as:

  • Number of training epochs
  • Data type selection (fp16, bf16, fp32)
  • ZeRO optimization stage (0, 1, 2, 3)
  • MoE configuration (number of experts, top-k routing, expert parallelism)

Layer 2: DeepSpeed Config Arguments

deepspeed.add_config_arguments(parser) injects DeepSpeed-specific arguments into the existing parser. These include:

  • --deepspeed -- Enable DeepSpeed engine
  • --deepspeed_config -- Path to JSON configuration file
  • --local_rank -- Local rank for distributed training (set by the launcher)

Layer 3: JSON Configuration

DeepSpeed uses a JSON configuration dictionary (or file) that defines the runtime behavior. This is separate from CLI arguments but works in conjunction. The JSON config specifies:

  • Optimizer type and hyperparameters
  • Learning rate scheduler
  • Mixed precision settings (fp16/bf16)
  • ZeRO optimization stages and parameters
  • Gradient clipping and accumulation
  • Batch size and micro-batch size

Theoretical Basis

Configuration Layering Pattern

DeepSpeed employs a layered configuration pattern where:

CLI Arguments (highest priority)
    |
    v
JSON Config File / Dictionary
    |
    v
DeepSpeed Defaults (lowest priority)

CLI arguments can override JSON config values. The JSON config provides the bulk of the runtime configuration, while CLI arguments handle deployment-specific settings (like local rank) and user-facing toggles (like enabling MoE).

Argument Namespacing

The deepspeed.add_config_arguments() function adds arguments to the same parser namespace, allowing the combined argument namespace to be passed directly to deepspeed.initialize(args=args, ...). This design avoids the need for separate configuration objects and keeps the integration surface minimal.

Distributed Launcher Integration

The --local_rank argument is critical for distributed training. The DeepSpeed launcher (deepspeed CLI command) automatically sets this argument for each process it spawns:

deepspeed launcher
    |
    +-- Process 0: --local_rank=0
    +-- Process 1: --local_rank=1
    +-- Process N: --local_rank=N

Configuration Categories

Category CLI Arguments Purpose
Training --epochs, --log-interval Control training duration and logging frequency
Precision --dtype {fp16,bf16,fp32} Select mixed precision data type
ZeRO --stage {0,1,2,3} Select ZeRO optimization stage
MoE --moe, --num-experts, --top-k, --ep-world-size, --min-capacity, --noisy-gate-policy, --mlp-type, --moe-param-group Configure Mixture of Experts
DeepSpeed --deepspeed, --deepspeed_config, --local_rank Core DeepSpeed runtime settings

Pattern: CLI-to-Config Mapping

The CLI arguments flow into the JSON configuration dictionary at initialization time. For example, in the CIFAR-10 example:

# CLI arg --dtype controls JSON config keys
ds_config["bf16"]["enabled"] = (args.dtype == "bf16")
ds_config["fp16"]["enabled"] = (args.dtype == "fp16")

# CLI arg --stage controls ZeRO config
ds_config["zero_optimization"]["stage"] = args.stage

This mapping pattern allows the same JSON config template to be reused across different runs by parameterizing key settings through CLI arguments.

Usage Patterns

Standard DeepSpeed Run

# Basic DeepSpeed launch (uses all available GPUs)
deepspeed cifar10_deepspeed.py --deepspeed

# With specific dtype and ZeRO stage
deepspeed cifar10_deepspeed.py --deepspeed --dtype bf16 --stage 2

# Via the provided shell script
bash run_ds.sh --dtype=fp16 --stage=0

MoE Run

# DeepSpeed with Mixture of Experts
deepspeed --num_nodes=1 --num_gpus=2 cifar10_deepspeed.py \
    --deepspeed \
    --moe \
    --ep-world-size 2 \
    --num-experts 2 \
    --top-k 1 \
    --noisy-gate-policy 'RSample' \
    --moe-param-group

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment