Principle:Fastai Fastbook Distance Classification

Knowledge Sources	Deep Learning for Coders with fastai and PyTorch Pattern Classification, Duda, Hart & Stork (2001)
Domains	Deep Learning, Classification, Computer Vision
Last Updated	2026-02-09 17:00 GMT

Overview

Distance-based classification assigns an input to the class whose prototype (template) it is nearest to, as measured by a distance metric such as L1 (mean absolute error) or L2 (mean squared error).

Description

The simplest approach to classification is to compute a representative "ideal" example for each class (typically the per-pixel mean across all training samples), then classify a new input by measuring its distance to each class template and assigning it to the nearest one. This technique is known as a nearest-centroid classifier or template matching.

While this method is too simple for production use, it establishes an important baseline and introduces two foundational concepts that carry forward into neural network training:

Loss functions quantify how far a prediction is from the target. L1 loss and MSE loss are the two most common distance measures.
Baseline thinking ensures that every subsequent, more complex model is evaluated against a simple, understandable reference point.

Usage

Use distance-based classification when:

You need a quick baseline to validate that your data pipeline produces reasonable results.
You want to introduce the concept of a loss function before building a trainable model.
You are comparing the discriminative power of different distance metrics on your data.

Theoretical Basis

Class Templates

Given N training images for class c, each of shape (H, W), the class template (centroid) is:

template_c[i, j] = (1/N) * sum(image_k[i, j] for k in 1..N)

L1 Distance (Mean Absolute Error)

The L1 distance between an image x and a template t is:

L1(x, t) = (1 / (H * W)) * sum(|x[i,j] - t[i,j]|  for all i, j)

L1 is robust to outliers because it penalizes all errors linearly.

L2 Distance (Mean Squared Error)

The MSE (L2 squared) distance is:

MSE(x, t) = (1 / (H * W)) * sum((x[i,j] - t[i,j])^2  for all i, j)

The root mean squared error (RMSE) takes the square root of MSE to return to the original units:

RMSE(x, t) = sqrt(MSE(x, t))

MSE penalizes large deviations more heavily than L1 due to the squaring operation, making it more sensitive to outlier pixels.

Classification Rule

For a two-class problem with templates template_3 and template_7:

if distance(x, template_3) < distance(x, template_7):
    predict class 3
else:
    predict class 7

This rule applies identically regardless of whether L1 or L2 is used as the distance function.

Why Absolute Values or Squares Are Needed

Naively summing raw pixel differences (x[i,j] - t[i,j]) without taking absolute values or squares allows positive and negative errors to cancel out. An image that is too bright in one region and too dark in another could appear to have zero total distance from the template, which would be misleading.

Related Pages

Implemented By

Implementation:Fastai_Fastbook_Loss_Functions

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment