Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Microsoft DeepSpeedExamples CIFAR10 Training Environment

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Computer_Vision, Getting_Started
Last Updated 2026-02-07 13:00 GMT

Overview

Minimal Python environment with PyTorch, torchvision, and DeepSpeed for training a simple CNN on CIFAR-10, supporting both CPU and GPU execution.

Description

This environment provides the minimal dependencies to run the CIFAR-10 getting started example, which demonstrates DeepSpeed integration with a basic convolutional neural network. It supports CPU-only execution as a fallback, making it the most lightweight environment in the repository. The example covers DeepSpeed ZeRO stages 0-3, mixed precision (fp16/bf16), and optionally Mixture of Experts (MoE) and PR-MoE layers.

Usage

Use this environment for learning DeepSpeed fundamentals with the CIFAR-10 tutorial. It is the mandatory prerequisite for the Net_Tutorial, Add_Argument_CIFAR, DeepSpeed_Initialize_CIFAR, Net_DeepSpeed, and Test_Function_CIFAR implementations.

System Requirements

Category Requirement Notes
OS Linux, macOS, or Windows Cross-platform support
Hardware Any CPU or NVIDIA GPU GPU optional; code falls back to CPU
Disk 200MB For CIFAR-10 dataset download

Dependencies

Python Packages

  • `torch` (PyTorch)
  • `torchvision` == 0.4.0
  • `pillow` >= 7.1.0
  • `matplotlib`
  • `deepspeed` (for DeepSpeed-enabled variant)

Credentials

No credentials required. CIFAR-10 dataset is downloaded automatically from public sources.

Quick Install

# Install all required packages
pip install torch torchvision pillow matplotlib deepspeed

Code Evidence

Requirements from `training/cifar/requirements.txt`:

torchvision==0.4.0
pillow>=7.1.0
matplotlib

Device detection from `training/cifar/cifar10_deepspeed.py:10,284`:

from deepspeed.accelerator import get_accelerator
# ...
get_accelerator().set_device(_local_rank)
# ...
local_device = get_accelerator().device_name(model_engine.local_rank)

CPU fallback from `training/cifar/cifar10_tutorial.py`:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Common Errors

Error Message Cause Solution
`BrokenPipeError` Windows DataLoader with num_workers > 0 Set `num_workers=0` in DataLoader on Windows
`RuntimeError: CUDA error` No CUDA-capable device Run on CPU or install CUDA drivers
`ModuleNotFoundError: deepspeed` DeepSpeed not installed `pip install deepspeed` (for DeepSpeed variant only)

Compatibility Notes

  • CPU Training: Fully supported via PyTorch CPU fallback; useful for testing without GPU
  • Windows: Set `num_workers=0` in DataLoader to avoid BrokenPipeError
  • MoE/PR-MoE: Requires multi-GPU setup for proper expert parallelism
  • torchvision pinned: The requirements file pins torchvision==0.4.0; newer versions are likely compatible

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment