Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Fastai Fastbook Backpropagation Manual

From Leeroopedia


Knowledge Sources
Domains Deep Learning, Calculus, Automatic Differentiation
Last Updated 2026-02-09 17:00 GMT

Overview

Concrete pattern for implementing backpropagation from scratch using manual chain-rule gradient functions and class-based layers, as demonstrated in fastbook Chapter 17.

Description

This implementation builds backpropagation without relying on PyTorch's autograd. It defines standalone gradient functions (mse_grad, relu_grad, lin_grad) and then refactors them into class-based layers (Relu, Lin, Mse) that each implement __call__ (forward) and backward methods. The gradients are stored directly on tensor attributes (.g), mirroring PyTorch's .grad attribute. The implementation is validated by comparing its computed gradients against PyTorch autograd.

Usage

Use this pattern when:

  • Learning how backpropagation works at a fundamental level.
  • Building custom autograd systems or understanding the PyTorch autograd internals.
  • Debugging gradient issues by comparing manual gradients with loss.backward() results.

Code Reference

Source Location

  • Repository: fastbook
  • File: 17_foundations.ipynb (Chapter 17), "Backpropagation" section

Signature

Functional approach:

def mse_grad(inp, targ):
    """Gradient of MSE loss with respect to its input."""
    inp.g = 2. * (inp.squeeze() - targ).unsqueeze(-1) / inp.shape[0]

def relu_grad(inp, out):
    """Gradient of ReLU with respect to its input."""
    inp.g = (inp > 0).float() * out.g

def lin_grad(inp, out, w, b):
    """Gradient of a linear layer with respect to input, weights, and bias."""
    inp.g = out.g @ w.t()
    w.g = inp.t() @ out.g
    b.g = out.g.sum(0)

def forward_and_backward(inp, targ):
    # Forward pass
    l1 = inp @ w1 + b1
    l2 = relu(l1)
    out = l2 @ w2 + b2
    loss = mse(out, targ)

    # Backward pass (reverse order)
    mse_grad(out, targ)
    lin_grad(l2, out, w2, b2)
    relu_grad(l1, l2)
    lin_grad(inp, l1, w1, b1)

Class-based approach:

class Relu:
    def __call__(self, inp):
        self.inp = inp
        self.out = inp.clamp_min(0.)
        return self.out

    def backward(self):
        self.inp.g = (self.inp > 0).float() * self.out.g

class Lin:
    def __init__(self, w, b):
        self.w, self.b = w, b

    def __call__(self, inp):
        self.inp = inp
        self.out = inp @ self.w + self.b
        return self.out

    def backward(self):
        self.inp.g = self.out.g @ self.w.t()
        self.w.g = self.inp.t() @ self.out.g
        self.b.g = self.out.g.sum(0)

class Mse:
    def __call__(self, inp, targ):
        self.inp = inp
        self.targ = targ
        self.out = (inp.squeeze() - targ).pow(2).mean()
        return self.out

    def backward(self):
        x = (self.inp.squeeze() - self.targ).unsqueeze(-1)
        self.inp.g = 2. * x / self.targ.shape[0]

class Model:
    def __init__(self, w1, b1, w2, b2):
        self.layers = [Lin(w1, b1), Relu(), Lin(w2, b2)]
        self.loss = Mse()

    def __call__(self, x, targ):
        for l in self.layers:
            x = l(x)
        return self.loss(x, targ)

    def backward(self):
        self.loss.backward()
        for l in reversed(self.layers):
            l.backward()

Import

import torch

I/O Contract

Inputs

Name Type Required Description
inp Tensor, shape (batch_size, n_features) Yes Input batch tensor (e.g., flattened images)
targ Tensor, shape (batch_size,) Yes Target values for the batch
w1, b1 Tensor Yes Weights and bias for the first linear layer
w2, b2 Tensor Yes Weights and bias for the second linear layer

Outputs

Name Type Description
w1.g Tensor, same shape as w1 Gradient of loss with respect to first-layer weights
b1.g Tensor, same shape as b1 Gradient of loss with respect to first-layer bias
w2.g Tensor, same shape as w2 Gradient of loss with respect to second-layer weights
b2.g Tensor, same shape as b2 Gradient of loss with respect to second-layer bias
inp.g Tensor, same shape as inp Gradient of loss with respect to input (for further backprop if needed)

Usage Examples

Basic Usage: Functional Approach

import torch

# Initialize parameters (Kaiming init)
n_inp = 784
n_hidden = 50
n_out = 1

w1 = torch.randn(n_inp, n_hidden) / n_inp**0.5
b1 = torch.zeros(n_hidden)
w2 = torch.randn(n_hidden, n_out) / n_hidden**0.5
b2 = torch.zeros(n_out)

def relu(x): return x.clamp_min(0.)
def mse(output, targ): return (output.squeeze() - targ).pow(2).mean()

# Run forward and backward
def forward_and_backward(inp, targ):
    l1 = inp @ w1 + b1
    l2 = relu(l1)
    out = l2 @ w2 + b2
    loss = mse(out, targ)

    mse_grad(out, targ)
    lin_grad(l2, out, w2, b2)
    relu_grad(l1, l2)
    lin_grad(inp, l1, w1, b1)

# Call it
forward_and_backward(x_train, y_train)

# Access gradients
print(w1.g.shape)  # torch.Size([784, 50])
print(w2.g.shape)  # torch.Size([50, 1])

Validating Against PyTorch Autograd

# Enable autograd tracking
w1_ag = w1.clone().requires_grad_(True)
b1_ag = b1.clone().requires_grad_(True)
w2_ag = w2.clone().requires_grad_(True)
b2_ag = b2.clone().requires_grad_(True)

# Forward pass with autograd
l1_ag = x_train @ w1_ag + b1_ag
l2_ag = l1_ag.clamp_min(0.)
out_ag = l2_ag @ w2_ag + b2_ag
loss_ag = (out_ag.squeeze() - y_train).pow(2).mean()
loss_ag.backward()

# Compare: manual gradients vs autograd
def test_near(a, b):
    assert torch.allclose(a, b, rtol=1e-3, atol=1e-5), "Gradients do not match!"

test_near(w1.g, w1_ag.grad)
test_near(b1.g, b1_ag.grad)
test_near(w2.g, w2_ag.grad)
test_near(b2.g, b2_ag.grad)
print("All gradients match PyTorch autograd!")

Class-Based Model

# Create model
model = Model(w1, b1, w2, b2)

# Forward pass (computes loss)
loss = model(x_train, y_train)

# Backward pass (computes all gradients)
model.backward()

# Gradients are now available
print(w1.g.shape)  # torch.Size([784, 50])
print(b2.g.shape)  # torch.Size([1])

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment