Principle:Fastai Fastbook Training Loop

Knowledge Sources	Robbins & Monro (1951), "A Stochastic Approximation Method" Deep Learning for Coders with fastai and PyTorch
Domains	Deep Learning, Optimization, Software Engineering
Last Updated	2026-02-09 17:00 GMT

Overview

The training loop is the outer control structure that repeatedly applies the predict-loss-backward-update cycle over batches and epochs to train a neural network, paired with a separate validation loop that evaluates performance without modifying parameters.

Description

While SGD defines the mathematical update rule, and backpropagation provides the gradients, the training loop is the software engineering pattern that orchestrates these components into a working system. It is responsible for:

Iterating over epochs (full passes through the dataset).
Iterating over mini-batches within each epoch.
Calling the model, loss function, backward pass, optimizer step, and gradient zeroing in the correct sequence.
Switching between training mode (gradients enabled, parameters updated) and validation mode (gradients disabled, no parameter updates).
Reporting metrics (loss, accuracy) at the end of each epoch.

The training loop is the most important pattern in deep learning engineering because every model, regardless of architecture, is trained through some variant of this loop.

Usage

Use the training loop pattern whenever:

Training any model with gradient-based optimization.
You need both training and validation phases to monitor generalization.
Building up from manual loops toward library abstractions like fastai's Learner.fit().

Theoretical Basis

The Canonical Training Loop

The minimal training loop has this structure:

for epoch in 1..n_epochs:
    # Training phase
    model.train()
    for batch_x, batch_y in train_dataloader:
        predictions = model(batch_x)          # forward pass
        loss = loss_function(predictions, batch_y)  # compute loss
        loss.backward()                        # compute gradients
        optimizer.step()                       # update parameters
        optimizer.zero_grad()                  # reset gradients

    # Validation phase
    model.eval()
    for batch_x, batch_y in valid_dataloader:
        predictions = model(batch_x)          # forward pass only
        loss = loss_function(predictions, batch_y)  # compute loss
        accumulate_metrics(predictions, batch_y)

    report_metrics()

Training vs. Validation

Aspect	Training	Validation
Gradients	Computed via `backward()`	Not computed
Parameter updates	Yes (`optimizer.step()`)	No
Data	Training set (shuffled)	Validation set (not shuffled)
Purpose	Minimize loss / learn parameters	Estimate generalization performance
Dropout/BatchNorm	Active (stochastic)	Inactive (deterministic)

Epoch vs. Batch

Batch (mini-batch): A subset of the training data processed in one forward/backward pass. Typical sizes: 32, 64, 128, 256.
Epoch: One complete pass through all training batches. The model sees every training sample exactly once per epoch.
Iteration/Step: Processing one batch. Steps per epoch = ceil(N_train / batch_size).

The Optimizer Abstraction

The optimizer encapsulates the parameter update rule and gradient zeroing:

class Optimizer:
    __init__(params, lr):
        store parameters and learning rate

    step():
        for each parameter p:
            p.data -= lr * p.grad.data

    zero_grad():
        for each parameter p:
            p.grad = None   (or p.grad.zero_())

This abstraction separates what to optimize (the model parameters) from how to optimize (the update rule), enabling easy swapping of optimization algorithms.

Metric Accumulation

Validation metrics are accumulated across all batches and then averaged:

accuracies = []
for xb, yb in valid_dl:
    preds = model(xb)
    accuracies.append(batch_accuracy(preds, yb))
epoch_accuracy = mean(accuracies)

Related Pages

Implemented By

Implementation:Fastai_Fastbook_Training_Loop_Manual

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment