Principle:Fastai Fastbook Training Loop
| Knowledge Sources | |
|---|---|
| Domains | Deep Learning, Optimization, Software Engineering |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
The training loop is the outer control structure that repeatedly applies the predict-loss-backward-update cycle over batches and epochs to train a neural network, paired with a separate validation loop that evaluates performance without modifying parameters.
Description
While SGD defines the mathematical update rule, and backpropagation provides the gradients, the training loop is the software engineering pattern that orchestrates these components into a working system. It is responsible for:
- Iterating over epochs (full passes through the dataset).
- Iterating over mini-batches within each epoch.
- Calling the model, loss function, backward pass, optimizer step, and gradient zeroing in the correct sequence.
- Switching between training mode (gradients enabled, parameters updated) and validation mode (gradients disabled, no parameter updates).
- Reporting metrics (loss, accuracy) at the end of each epoch.
The training loop is the most important pattern in deep learning engineering because every model, regardless of architecture, is trained through some variant of this loop.
Usage
Use the training loop pattern whenever:
- Training any model with gradient-based optimization.
- You need both training and validation phases to monitor generalization.
- Building up from manual loops toward library abstractions like fastai's
Learner.fit().
Theoretical Basis
The Canonical Training Loop
The minimal training loop has this structure:
for epoch in 1..n_epochs:
# Training phase
model.train()
for batch_x, batch_y in train_dataloader:
predictions = model(batch_x) # forward pass
loss = loss_function(predictions, batch_y) # compute loss
loss.backward() # compute gradients
optimizer.step() # update parameters
optimizer.zero_grad() # reset gradients
# Validation phase
model.eval()
for batch_x, batch_y in valid_dataloader:
predictions = model(batch_x) # forward pass only
loss = loss_function(predictions, batch_y) # compute loss
accumulate_metrics(predictions, batch_y)
report_metrics()
Training vs. Validation
| Aspect | Training | Validation |
|---|---|---|
| Gradients | Computed via backward() |
Not computed |
| Parameter updates | Yes (optimizer.step()) |
No |
| Data | Training set (shuffled) | Validation set (not shuffled) |
| Purpose | Minimize loss / learn parameters | Estimate generalization performance |
| Dropout/BatchNorm | Active (stochastic) | Inactive (deterministic) |
Epoch vs. Batch
- Batch (mini-batch): A subset of the training data processed in one forward/backward pass. Typical sizes: 32, 64, 128, 256.
- Epoch: One complete pass through all training batches. The model sees every training sample exactly once per epoch.
- Iteration/Step: Processing one batch. Steps per epoch = ceil(N_train / batch_size).
The Optimizer Abstraction
The optimizer encapsulates the parameter update rule and gradient zeroing:
class Optimizer:
__init__(params, lr):
store parameters and learning rate
step():
for each parameter p:
p.data -= lr * p.grad.data
zero_grad():
for each parameter p:
p.grad = None (or p.grad.zero_())
This abstraction separates what to optimize (the model parameters) from how to optimize (the update rule), enabling easy swapping of optimization algorithms.
Metric Accumulation
Validation metrics are accumulated across all batches and then averaged:
accuracies = []
for xb, yb in valid_dl:
preds = model(xb)
accuracies.append(batch_accuracy(preds, yb))
epoch_accuracy = mean(accuracies)