Workflow:Microsoft DeepSpeedExamples CIFAR10 Getting Started

Knowledge Sources	DeepSpeedExamples DeepSpeed Docs
Domains	Computer_Vision, Distributed_Training, Getting_Started
Last Updated	2026-02-07 13:00 GMT

Overview

Introductory tutorial demonstrating how to migrate a vanilla PyTorch training script to DeepSpeed, using CIFAR-10 image classification as a simple example with optional Mixture of Experts (MoE) support.

Description

This workflow walks through the standard pattern for integrating DeepSpeed into an existing PyTorch training loop. Starting from a baseline PyTorch CNN training script for CIFAR-10, it demonstrates the minimal changes required to enable DeepSpeed distributed training with features like mixed precision, ZeRO optimization, and Mixture of Experts.

Goal: A working DeepSpeed-accelerated training pipeline that classifies CIFAR-10 images, serving as a template for integrating DeepSpeed into any PyTorch project.

Scope: Covers the complete migration from vanilla PyTorch to DeepSpeed, including argument parsing integration, engine initialization, mixed precision configuration, and optional MoE layer injection.

Strategy: Demonstrates a side-by-side comparison between vanilla PyTorch and DeepSpeed-enabled training, highlighting the minimal code changes needed. Optionally adds Mixture of Experts layers to demonstrate expert parallelism.

Usage

Execute this workflow when you are new to DeepSpeed and want to understand the basic integration pattern. This is the recommended starting point for learning how to use DeepSpeed with any PyTorch model. It demonstrates core concepts (engine initialization, configuration, distributed training) on a small, fast-to-train model before applying them to larger projects.

Execution Steps

Step 1: Baseline PyTorch Training

Understand the vanilla PyTorch training script that serves as the starting point. This establishes the baseline pattern that will be modified.

What happens:

Define a simple CNN model (2 conv layers + 3 fully connected layers)
Load CIFAR-10 dataset with standard torchvision transforms
Set up SGD optimizer and CrossEntropyLoss
Run a standard training loop with forward pass, loss computation, and backward pass
Evaluate per-class accuracy on the test set

Step 2: DeepSpeed Argument Integration

Add DeepSpeed argument parsing to the training script to accept DeepSpeed configuration via command line.

Key considerations:

Add DeepSpeed-specific arguments using the add_config_arguments helper
Support for local_rank argument required by distributed launcher
Additional arguments for MoE configuration (number of experts, top-k, expert parallelism)

Step 3: DeepSpeed Engine Initialization

Replace PyTorch's manual optimizer and scheduler setup with DeepSpeed's engine initialization.

What happens:

Replace manual optimizer creation with deepspeed.initialize()
The engine wraps model, optimizer, dataloader, and learning rate scheduler
DeepSpeed handles distributed training setup (process groups, gradient synchronization)
Configuration provided via JSON file or dictionary specifying ZeRO stage, precision, batch size, and optimizer settings

Step 4: Training with DeepSpeed

Execute the modified training loop using the DeepSpeed engine API.

What happens:

Replace loss.backward() with model_engine.backward(loss) for gradient computation
Replace optimizer.step() with model_engine.step() for parameter updates
DeepSpeed automatically handles gradient accumulation, mixed precision scaling, and communication
Optional: Add MoE layers to the model for expert parallelism demonstration
Configure data types (fp16, bf16, fp32) and ZeRO optimization stages (0-3) via JSON config

Step 5: Evaluation

Test the trained model on the CIFAR-10 test set with per-class accuracy reporting.

What happens:

Load the test dataset and run inference
Compute overall accuracy and per-class accuracy for all 10 CIFAR-10 classes
Compare results between baseline PyTorch and DeepSpeed-accelerated training

Execution Diagram

GitHub URL

Workflow Repository