Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Microsoft DeepSpeedExamples CIFAR10 Getting Started

From Leeroopedia


Knowledge Sources
Domains Computer_Vision, Distributed_Training, Getting_Started
Last Updated 2026-02-07 13:00 GMT

Overview

Introductory tutorial demonstrating how to migrate a vanilla PyTorch training script to DeepSpeed, using CIFAR-10 image classification as a simple example with optional Mixture of Experts (MoE) support.

Description

This workflow walks through the standard pattern for integrating DeepSpeed into an existing PyTorch training loop. Starting from a baseline PyTorch CNN training script for CIFAR-10, it demonstrates the minimal changes required to enable DeepSpeed distributed training with features like mixed precision, ZeRO optimization, and Mixture of Experts.

Goal: A working DeepSpeed-accelerated training pipeline that classifies CIFAR-10 images, serving as a template for integrating DeepSpeed into any PyTorch project.

Scope: Covers the complete migration from vanilla PyTorch to DeepSpeed, including argument parsing integration, engine initialization, mixed precision configuration, and optional MoE layer injection.

Strategy: Demonstrates a side-by-side comparison between vanilla PyTorch and DeepSpeed-enabled training, highlighting the minimal code changes needed. Optionally adds Mixture of Experts layers to demonstrate expert parallelism.

Usage

Execute this workflow when you are new to DeepSpeed and want to understand the basic integration pattern. This is the recommended starting point for learning how to use DeepSpeed with any PyTorch model. It demonstrates core concepts (engine initialization, configuration, distributed training) on a small, fast-to-train model before applying them to larger projects.

Execution Steps

Step 1: Baseline PyTorch Training

Understand the vanilla PyTorch training script that serves as the starting point. This establishes the baseline pattern that will be modified.

What happens:

  • Define a simple CNN model (2 conv layers + 3 fully connected layers)
  • Load CIFAR-10 dataset with standard torchvision transforms
  • Set up SGD optimizer and CrossEntropyLoss
  • Run a standard training loop with forward pass, loss computation, and backward pass
  • Evaluate per-class accuracy on the test set

Step 2: DeepSpeed Argument Integration

Add DeepSpeed argument parsing to the training script to accept DeepSpeed configuration via command line.

Key considerations:

  • Add DeepSpeed-specific arguments using the add_config_arguments helper
  • Support for local_rank argument required by distributed launcher
  • Additional arguments for MoE configuration (number of experts, top-k, expert parallelism)

Step 3: DeepSpeed Engine Initialization

Replace PyTorch's manual optimizer and scheduler setup with DeepSpeed's engine initialization.

What happens:

  • Replace manual optimizer creation with deepspeed.initialize()
  • The engine wraps model, optimizer, dataloader, and learning rate scheduler
  • DeepSpeed handles distributed training setup (process groups, gradient synchronization)
  • Configuration provided via JSON file or dictionary specifying ZeRO stage, precision, batch size, and optimizer settings

Step 4: Training with DeepSpeed

Execute the modified training loop using the DeepSpeed engine API.

What happens:

  • Replace loss.backward() with model_engine.backward(loss) for gradient computation
  • Replace optimizer.step() with model_engine.step() for parameter updates
  • DeepSpeed automatically handles gradient accumulation, mixed precision scaling, and communication
  • Optional: Add MoE layers to the model for expert parallelism demonstration
  • Configure data types (fp16, bf16, fp32) and ZeRO optimization stages (0-3) via JSON config

Step 5: Evaluation

Test the trained model on the CIFAR-10 test set with per-class accuracy reporting.

What happens:

  • Load the test dataset and run inference
  • Compute overall accuracy and per-class accuracy for all 10 CIFAR-10 classes
  • Compare results between baseline PyTorch and DeepSpeed-accelerated training

Execution Diagram

GitHub URL

Workflow Repository