Workflow:Fastai Fastbook Neural Network From Scratch

Knowledge Sources	fastai/fastbook fastai Documentation PyTorch Documentation
Domains	Deep_Learning, Neural_Networks, Foundations
Last Updated	2026-02-09 17:00 GMT

Overview

End-to-end process for building a neural network from first principles using only basic tensor operations, covering pixel-level data representation through SGD-based training and manual backpropagation.

Description

This workflow teaches the foundational mechanics of neural networks by constructing one from scratch without high-level libraries. Starting with raw image data as pixel tensors, it progresses through computing similarities with simple baselines, implementing gradient descent manually, building a linear classifier, adding non-linearities to create a true neural network, and implementing backpropagation by hand. The workflow spans the book's foundational chapters (4, 13, 17, 19) and culminates in understanding how fastai's Learner class orchestrates these components. It uses MNIST digit classification as the running example.

Usage

Execute this workflow when learning the fundamentals of how neural networks actually work at the mathematical and implementation level. This is a pedagogical workflow designed for understanding, not production use. It answers: how do pixels become predictions? How does a model learn from data? What is backpropagation really doing? Completing this workflow provides the deep understanding needed to debug, customize, and innovate with neural networks.

Execution Steps

Step 1: Data as Tensors

Load image data and understand its numerical representation. Images are 2D arrays (matrices) of pixel intensity values. Convert images to PyTorch tensors and NumPy arrays. Understand tensor shapes, indexing, and basic operations. Compute simple statistics (mean pixel values) to establish a baseline.

Key considerations:

Grayscale images are 2D tensors (height x width) with values 0-255
Batches of images are 3D tensors (batch x height x width)
Understanding broadcasting rules is essential for efficient tensor computation
A simple pixel-average baseline provides a reference point before building more complex models

Step 2: Distance_based Classification

Build the simplest possible classifier by computing the mean image for each class and classifying new images based on distance (L1 or L2) to these means. This baseline approach demonstrates the core classification concept: measuring how similar a new input is to known examples of each class.

Key considerations:

Mean absolute error (L1) and root mean squared error (L2) are two common distance metrics
This approach works surprisingly well for simple tasks like distinguishing 3s from 7s
It establishes a performance baseline that more complex models should exceed
Broadcasting enables computing distances across entire batches efficiently

Step 3: SGD from Scratch

Implement stochastic gradient descent manually. Start by defining a simple linear model (weights and bias), write a loss function, compute gradients using PyTorch's autograd, and update weights by subtracting the gradient scaled by a learning rate. Iterate this process over mini-batches to train the model.

Key considerations:

Initialize weights randomly and bias to zero
The learning rate controls step size; too large causes divergence, too small causes slow convergence
Mini-batches provide a balance between computation speed and gradient accuracy
PyTorch's requires_grad_ and .backward() handle gradient computation; manual weight updates follow

Step 4: Non_linear Activation Functions

Transform the linear model into a neural network by adding a non-linear activation function (ReLU) between layers. Without non-linearity, stacking linear layers is equivalent to a single linear layer. ReLU (max(0, x)) is the simplest activation that enables the network to learn complex, non-linear patterns.

Key considerations:

A model with at least one hidden layer and a non-linear activation is a universal approximator
ReLU replaces negative values with zero, creating piecewise-linear functions
The hidden layer size controls model capacity (more neurons = more complex functions)
This step transforms a linear classifier into a true neural network

Step 5: Backpropagation Implementation

Implement the backward pass (backpropagation) manually using the chain rule of calculus. For each layer in reverse order, compute the gradient of the loss with respect to that layer's inputs and parameters. This step demystifies what PyTorch's autograd does automatically and builds intuition for gradient flow through networks.

Key considerations:

The chain rule decomposes complex gradients into products of simpler local gradients
Each operation (matrix multiply, ReLU, loss function) has a known derivative
Gradients flow backward through the computation graph from loss to inputs
Understanding gradient flow helps diagnose training problems (vanishing/exploding gradients)

Step 6: Training Loop Construction

Assemble all components into a complete training loop: forward pass (compute predictions), loss calculation, backward pass (compute gradients), optimizer step (update weights), and gradient zeroing. Add batching, epoch tracking, and validation evaluation. This loop is the core of all neural network training.

Key considerations:

Zero gradients before each backward pass to prevent accumulation
Evaluate on validation data without computing gradients (torch.no_grad)
Track and print loss and metrics per epoch to monitor progress
This manual loop is exactly what fastai's Learner.fit automates

Step 7: Building the Learner

Understand how fastai's Learner class wraps the training loop with a callback system for extensibility. The Learner manages the model, optimizer, data, and loss function, while callbacks inject behavior at specific points in the training loop (after batch, after epoch, etc.). This architecture enables features like learning rate scheduling, mixed precision, and gradient accumulation without modifying the core loop.

Key considerations:

The callback system follows the event-driven pattern: hooks fire at defined training events
Standard callbacks implement one-cycle training, recording metrics, early stopping
Custom callbacks can modify any aspect of training (learning rate, loss, gradients)
Understanding the Learner internals enables advanced customization and debugging

Execution Diagram

GitHub URL

Workflow Repository