Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft DeepSpeedExamples Net Tutorial

From Leeroopedia


Metadata

Field Value
Page Type Implementation
Repository Microsoft/DeepSpeedExamples
Title Net_Tutorial
Type Pattern Doc (reference implementation)
Source File training/cifar/cifar10_tutorial.py
Lines 130-197
Implements Principle:Microsoft_DeepSpeedExamples_Baseline_PyTorch_Training

Overview

Reference PyTorch CNN implementation for CIFAR-10 that serves as the baseline before DeepSpeed migration.

Description

The Net class in cifar10_tutorial.py is the canonical baseline CNN for the CIFAR-10 Getting Started workflow. It implements a straightforward convolutional neural network with two convolutional layers followed by three fully connected layers. This model, combined with the surrounding training loop, optimizer setup, and evaluation code, constitutes the complete baseline pattern that DeepSpeed migration builds upon.

The implementation follows standard PyTorch conventions:

  • Layers are defined in __init__ as module attributes
  • The forward method chains these layers with activation functions and pooling
  • The model is instantiated and moved to a device explicitly
  • An external optimizer (optim.SGD) and loss function (nn.CrossEntropyLoss) are created separately

The training loop at lines 173-197 demonstrates the explicit three-call pattern (zero_grad / backward / step) that DeepSpeed will later absorb into its engine.

Code Reference

File: training/cifar/cifar10_tutorial.py, Lines 130-197

Model Definition (Lines 130-147)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Optimizer and Loss Setup (Lines 160-163)

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Training Loop (Lines 173-197)

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data[0].to(device), data[1].to(device)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:  # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

Signature

class Net(nn.Module):
    def __init__(self):
        # Conv2d(3, 6, 5) -> MaxPool2d(2, 2) -> Conv2d(6, 16, 5) -> MaxPool2d(2, 2)
        # FC(16*5*5, 120) -> FC(120, 84) -> FC(84, 10)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        ...

I/O Contract

Direction Name Type Description
Input x torch.Tensor (B, 3, 32, 32) Batch of normalized CIFAR-10 images, range [-1, 1]
Output logits torch.Tensor (B, 10) Raw class scores (logits) for 10 CIFAR-10 classes

Training I/O:

Component Type Configuration
trainloader DataLoader batch_size=4, shuffle=True, num_workers=2
testloader DataLoader batch_size=4, shuffle=False, num_workers=2
criterion nn.CrossEntropyLoss Default (mean reduction)
optimizer optim.SGD lr=0.001, momentum=0.9

Architecture Diagram

Input: (B, 3, 32, 32)
        |
   [Conv2d(3, 6, 5)]  --> ReLU --> [MaxPool2d(2, 2)]
        |                              Output: (B, 6, 14, 14)
   [Conv2d(6, 16, 5)] --> ReLU --> [MaxPool2d(2, 2)]
        |                              Output: (B, 16, 5, 5)
   [Flatten]                           Output: (B, 400)
        |
   [Linear(400, 120)]  --> ReLU        Output: (B, 120)
        |
   [Linear(120, 84)]   --> ReLU        Output: (B, 84)
        |
   [Linear(84, 10)]                    Output: (B, 10) -- raw logits

Usage Example

# Run the baseline tutorial directly
python cifar10_tutorial.py
# Programmatic usage
import torch
import torch.nn as nn
import torch.nn.functional as F

net = Net()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)

# Single forward pass
sample_input = torch.randn(4, 3, 32, 32).to(device)
logits = net(sample_input)  # Shape: (4, 10)
predicted_classes = torch.argmax(logits, dim=1)  # Shape: (4,)

Key Differences from DeepSpeed Version

Aspect Baseline (Net_Tutorial) DeepSpeed (Net_DeepSpeed)
Constructor __init__(self) -- no arguments __init__(self, args) -- accepts argument namespace
Final layer self.fc3 = nn.Linear(84, 10) Optionally replaced with MoE layer + fc4
Optimizer Manual optim.SGD Created internally by deepspeed.initialize()
DataLoader Manual DataLoader Created by DeepSpeed with distributed sampling
Training loop zero_grad() / backward() / step() model_engine.backward() / model_engine.step()
Device management Manual .to(device) Managed by DeepSpeed engine via local_rank

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment