Principle:Fastai Fastbook Distance Classification
| Knowledge Sources | |
|---|---|
| Domains | Deep Learning, Classification, Computer Vision |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Distance-based classification assigns an input to the class whose prototype (template) it is nearest to, as measured by a distance metric such as L1 (mean absolute error) or L2 (mean squared error).
Description
The simplest approach to classification is to compute a representative "ideal" example for each class (typically the per-pixel mean across all training samples), then classify a new input by measuring its distance to each class template and assigning it to the nearest one. This technique is known as a nearest-centroid classifier or template matching.
While this method is too simple for production use, it establishes an important baseline and introduces two foundational concepts that carry forward into neural network training:
- Loss functions quantify how far a prediction is from the target. L1 loss and MSE loss are the two most common distance measures.
- Baseline thinking ensures that every subsequent, more complex model is evaluated against a simple, understandable reference point.
Usage
Use distance-based classification when:
- You need a quick baseline to validate that your data pipeline produces reasonable results.
- You want to introduce the concept of a loss function before building a trainable model.
- You are comparing the discriminative power of different distance metrics on your data.
Theoretical Basis
Class Templates
Given N training images for class c, each of shape (H, W), the class template (centroid) is:
template_c[i, j] = (1/N) * sum(image_k[i, j] for k in 1..N)
L1 Distance (Mean Absolute Error)
The L1 distance between an image x and a template t is:
L1(x, t) = (1 / (H * W)) * sum(|x[i,j] - t[i,j]| for all i, j)
L1 is robust to outliers because it penalizes all errors linearly.
L2 Distance (Mean Squared Error)
The MSE (L2 squared) distance is:
MSE(x, t) = (1 / (H * W)) * sum((x[i,j] - t[i,j])^2 for all i, j)
The root mean squared error (RMSE) takes the square root of MSE to return to the original units:
RMSE(x, t) = sqrt(MSE(x, t))
MSE penalizes large deviations more heavily than L1 due to the squaring operation, making it more sensitive to outlier pixels.
Classification Rule
For a two-class problem with templates template_3 and template_7:
if distance(x, template_3) < distance(x, template_7):
predict class 3
else:
predict class 7
This rule applies identically regardless of whether L1 or L2 is used as the distance function.
Why Absolute Values or Squares Are Needed
Naively summing raw pixel differences (x[i,j] - t[i,j]) without taking absolute values or squares allows positive and negative errors to cancel out. An image that is too bright in one region and too dark in another could appear to have zero total distance from the template, which would be misleading.