Principle:Fastai Fastbook Transfer Learning
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Deep_Learning, Transfer_Learning |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Transfer learning is the technique of reusing a model trained on a large general dataset as the starting point for a model on a different, typically smaller, task-specific dataset.
Description
Training a deep convolutional neural network from scratch requires millions of labeled images and days of compute time. Transfer learning circumvents this by starting with a model that has already learned to extract useful visual features from a large dataset (typically ImageNet, with 1.2 million images across 1,000 categories). The key insight is that early layers of a CNN learn general features (edges, textures, shapes) that are useful for virtually any visual task, while later layers learn increasingly task-specific features.
Transfer learning for image classification works in three steps:
- Load a pretrained model -- take a network (e.g., ResNet34) with weights trained on ImageNet.
- Replace the head -- remove the final classification layer (which outputs 1,000 ImageNet classes) and replace it with a new layer that outputs the number of classes in the target task.
- Train the head -- freeze the pretrained body and train only the new head on the target dataset, then optionally unfreeze and fine-tune the entire network.
This approach routinely achieves high accuracy with as few as a hundred images per class and trains in minutes rather than days.
Usage
Use transfer learning whenever you are training an image classifier and a pretrained model is available for your input modality. It is the default and recommended approach in fastai for all image classification tasks. The only scenario where training from scratch might be preferred is when the target domain is radically different from natural images (e.g., spectrograms, medical scans with no visual similarity to photographs), and even then transfer learning often helps.
Theoretical Basis
Feature Hierarchy in CNNs
Research by Zeiler and Fergus (2013) and Yosinski et al. (2014) demonstrated that CNN layers learn hierarchical features:
| Layer Depth | Features Learned | Transferability |
|---|---|---|
| Early layers (1-2) | Edges, gradients, colors | Highly general; transfer to almost any visual task |
| Middle layers (3-4) | Textures, patterns, parts | Moderately general; useful for most tasks |
| Late layers (5+) | Object parts, semantic concepts | Task-specific; may need retraining |
| Classification head | Category probabilities | Entirely task-specific; must be replaced |
Body-Head Architecture
In the transfer learning paradigm, the network is conceptually divided into:
- Body (backbone): All convolutional layers from the pretrained model. These encode the feature extraction pipeline.
- Head: One or more fully connected layers that map the body's feature vector to the target class probabilities.
The head for a new task is typically:
AdaptiveAvgPool2d -> Flatten -> BatchNorm1d -> Dropout -> Linear -> ReLU -> BatchNorm1d -> Dropout -> Linear(num_classes)
Why Freezing Works
When the body is frozen (gradients not computed for body parameters), only the randomly initialized head is updated. Because the body already produces meaningful feature vectors, the head can learn a good linear decision boundary in just one or two epochs. This prevents the large random gradients from the untrained head from corrupting the carefully learned body weights during the initial training phase.
Common Pretrained Architectures
| Architecture | Parameters | Top-1 Accuracy (ImageNet) | Recommended Use |
|---|---|---|---|
| ResNet18 | 11.7M | 69.8% | Quick experiments, small datasets |
| ResNet34 | 21.8M | 73.3% | Default starting point in fastai |
| ResNet50 | 25.6M | 76.1% | Better accuracy when GPU memory allows |
| ResNet101 | 44.5M | 77.4% | Large datasets, fine-grained tasks |