Implementation:Microsoft DeepSpeedExamples Net Tutorial
Metadata
| Field | Value |
|---|---|
| Page Type | Implementation |
| Repository | Microsoft/DeepSpeedExamples |
| Title | Net_Tutorial |
| Type | Pattern Doc (reference implementation) |
| Source File | training/cifar/cifar10_tutorial.py
|
| Lines | 130-197 |
| Implements | Principle:Microsoft_DeepSpeedExamples_Baseline_PyTorch_Training |
Overview
Reference PyTorch CNN implementation for CIFAR-10 that serves as the baseline before DeepSpeed migration.
Description
The Net class in cifar10_tutorial.py is the canonical baseline CNN for the CIFAR-10 Getting Started workflow. It implements a straightforward convolutional neural network with two convolutional layers followed by three fully connected layers. This model, combined with the surrounding training loop, optimizer setup, and evaluation code, constitutes the complete baseline pattern that DeepSpeed migration builds upon.
The implementation follows standard PyTorch conventions:
- Layers are defined in
__init__as module attributes - The
forwardmethod chains these layers with activation functions and pooling - The model is instantiated and moved to a device explicitly
- An external optimizer (
optim.SGD) and loss function (nn.CrossEntropyLoss) are created separately
The training loop at lines 173-197 demonstrates the explicit three-call pattern (zero_grad / backward / step) that DeepSpeed will later absorb into its engine.
Code Reference
File: training/cifar/cifar10_tutorial.py, Lines 130-197
Model Definition (Lines 130-147)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
Optimizer and Loss Setup (Lines 160-163)
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Training Loop (Lines 173-197)
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data[0].to(device), data[1].to(device)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
Signature
class Net(nn.Module):
def __init__(self):
# Conv2d(3, 6, 5) -> MaxPool2d(2, 2) -> Conv2d(6, 16, 5) -> MaxPool2d(2, 2)
# FC(16*5*5, 120) -> FC(120, 84) -> FC(84, 10)
def forward(self, x: torch.Tensor) -> torch.Tensor:
...
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | x | torch.Tensor (B, 3, 32, 32) |
Batch of normalized CIFAR-10 images, range [-1, 1] |
| Output | logits | torch.Tensor (B, 10) |
Raw class scores (logits) for 10 CIFAR-10 classes |
Training I/O:
| Component | Type | Configuration |
|---|---|---|
| trainloader | DataLoader |
batch_size=4, shuffle=True, num_workers=2 |
| testloader | DataLoader |
batch_size=4, shuffle=False, num_workers=2 |
| criterion | nn.CrossEntropyLoss |
Default (mean reduction) |
| optimizer | optim.SGD |
lr=0.001, momentum=0.9 |
Architecture Diagram
Input: (B, 3, 32, 32)
|
[Conv2d(3, 6, 5)] --> ReLU --> [MaxPool2d(2, 2)]
| Output: (B, 6, 14, 14)
[Conv2d(6, 16, 5)] --> ReLU --> [MaxPool2d(2, 2)]
| Output: (B, 16, 5, 5)
[Flatten] Output: (B, 400)
|
[Linear(400, 120)] --> ReLU Output: (B, 120)
|
[Linear(120, 84)] --> ReLU Output: (B, 84)
|
[Linear(84, 10)] Output: (B, 10) -- raw logits
Usage Example
# Run the baseline tutorial directly
python cifar10_tutorial.py
# Programmatic usage
import torch
import torch.nn as nn
import torch.nn.functional as F
net = Net()
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net.to(device)
# Single forward pass
sample_input = torch.randn(4, 3, 32, 32).to(device)
logits = net(sample_input) # Shape: (4, 10)
predicted_classes = torch.argmax(logits, dim=1) # Shape: (4,)
Key Differences from DeepSpeed Version
| Aspect | Baseline (Net_Tutorial) | DeepSpeed (Net_DeepSpeed) |
|---|---|---|
| Constructor | __init__(self) -- no arguments |
__init__(self, args) -- accepts argument namespace
|
| Final layer | self.fc3 = nn.Linear(84, 10) |
Optionally replaced with MoE layer + fc4
|
| Optimizer | Manual optim.SGD |
Created internally by deepspeed.initialize()
|
| DataLoader | Manual DataLoader |
Created by DeepSpeed with distributed sampling |
| Training loop | zero_grad() / backward() / step() |
model_engine.backward() / model_engine.step()
|
| Device management | Manual .to(device) |
Managed by DeepSpeed engine via local_rank
|
Related Pages
- Principle:Microsoft_DeepSpeedExamples_Baseline_PyTorch_Training -- The principle this implementation realizes
- Implementation:Microsoft_DeepSpeedExamples_Net_DeepSpeed -- DeepSpeed-enhanced version of the same model
- Implementation:Microsoft_DeepSpeedExamples_Test_Function_CIFAR -- Evaluation function used after training
- Environment:Microsoft_DeepSpeedExamples_CIFAR10_Training_Environment