Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Fastai Fastbook Training Loop

From Leeroopedia


Knowledge Sources
Domains Deep Learning, Optimization, Software Engineering
Last Updated 2026-02-09 17:00 GMT

Overview

The training loop is the outer control structure that repeatedly applies the predict-loss-backward-update cycle over batches and epochs to train a neural network, paired with a separate validation loop that evaluates performance without modifying parameters.

Description

While SGD defines the mathematical update rule, and backpropagation provides the gradients, the training loop is the software engineering pattern that orchestrates these components into a working system. It is responsible for:

  1. Iterating over epochs (full passes through the dataset).
  2. Iterating over mini-batches within each epoch.
  3. Calling the model, loss function, backward pass, optimizer step, and gradient zeroing in the correct sequence.
  4. Switching between training mode (gradients enabled, parameters updated) and validation mode (gradients disabled, no parameter updates).
  5. Reporting metrics (loss, accuracy) at the end of each epoch.

The training loop is the most important pattern in deep learning engineering because every model, regardless of architecture, is trained through some variant of this loop.

Usage

Use the training loop pattern whenever:

  • Training any model with gradient-based optimization.
  • You need both training and validation phases to monitor generalization.
  • Building up from manual loops toward library abstractions like fastai's Learner.fit().

Theoretical Basis

The Canonical Training Loop

The minimal training loop has this structure:

for epoch in 1..n_epochs:
    # Training phase
    model.train()
    for batch_x, batch_y in train_dataloader:
        predictions = model(batch_x)          # forward pass
        loss = loss_function(predictions, batch_y)  # compute loss
        loss.backward()                        # compute gradients
        optimizer.step()                       # update parameters
        optimizer.zero_grad()                  # reset gradients

    # Validation phase
    model.eval()
    for batch_x, batch_y in valid_dataloader:
        predictions = model(batch_x)          # forward pass only
        loss = loss_function(predictions, batch_y)  # compute loss
        accumulate_metrics(predictions, batch_y)

    report_metrics()

Training vs. Validation

Aspect Training Validation
Gradients Computed via backward() Not computed
Parameter updates Yes (optimizer.step()) No
Data Training set (shuffled) Validation set (not shuffled)
Purpose Minimize loss / learn parameters Estimate generalization performance
Dropout/BatchNorm Active (stochastic) Inactive (deterministic)

Epoch vs. Batch

  • Batch (mini-batch): A subset of the training data processed in one forward/backward pass. Typical sizes: 32, 64, 128, 256.
  • Epoch: One complete pass through all training batches. The model sees every training sample exactly once per epoch.
  • Iteration/Step: Processing one batch. Steps per epoch = ceil(N_train / batch_size).

The Optimizer Abstraction

The optimizer encapsulates the parameter update rule and gradient zeroing:

class Optimizer:
    __init__(params, lr):
        store parameters and learning rate

    step():
        for each parameter p:
            p.data -= lr * p.grad.data

    zero_grad():
        for each parameter p:
            p.grad = None   (or p.grad.zero_())

This abstraction separates what to optimize (the model parameters) from how to optimize (the update rule), enabling easy swapping of optimization algorithms.

Metric Accumulation

Validation metrics are accumulated across all batches and then averaged:

accuracies = []
for xb, yb in valid_dl:
    preds = model(xb)
    accuracies.append(batch_accuracy(preds, yb))
epoch_accuracy = mean(accuracies)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment