Principle:Fastai Fastbook Transfer Learning

Knowledge Sources	Deep Learning for Coders with fastai & PyTorch How transferable are features in deep neural networks? fastai: A Layered API for Deep Learning
Domains	Computer_Vision, Deep_Learning, Transfer_Learning
Last Updated	2026-02-09 17:00 GMT

Overview

Transfer learning is the technique of reusing a model trained on a large general dataset as the starting point for a model on a different, typically smaller, task-specific dataset.

Description

Training a deep convolutional neural network from scratch requires millions of labeled images and days of compute time. Transfer learning circumvents this by starting with a model that has already learned to extract useful visual features from a large dataset (typically ImageNet, with 1.2 million images across 1,000 categories). The key insight is that early layers of a CNN learn general features (edges, textures, shapes) that are useful for virtually any visual task, while later layers learn increasingly task-specific features.

Transfer learning for image classification works in three steps:

Load a pretrained model -- take a network (e.g., ResNet34) with weights trained on ImageNet.
Replace the head -- remove the final classification layer (which outputs 1,000 ImageNet classes) and replace it with a new layer that outputs the number of classes in the target task.
Train the head -- freeze the pretrained body and train only the new head on the target dataset, then optionally unfreeze and fine-tune the entire network.

This approach routinely achieves high accuracy with as few as a hundred images per class and trains in minutes rather than days.

Usage

Use transfer learning whenever you are training an image classifier and a pretrained model is available for your input modality. It is the default and recommended approach in fastai for all image classification tasks. The only scenario where training from scratch might be preferred is when the target domain is radically different from natural images (e.g., spectrograms, medical scans with no visual similarity to photographs), and even then transfer learning often helps.

Theoretical Basis

Feature Hierarchy in CNNs

Research by Zeiler and Fergus (2013) and Yosinski et al. (2014) demonstrated that CNN layers learn hierarchical features:

Layer Depth	Features Learned	Transferability
Early layers (1-2)	Edges, gradients, colors	Highly general; transfer to almost any visual task
Middle layers (3-4)	Textures, patterns, parts	Moderately general; useful for most tasks
Late layers (5+)	Object parts, semantic concepts	Task-specific; may need retraining
Classification head	Category probabilities	Entirely task-specific; must be replaced

Body-Head Architecture

In the transfer learning paradigm, the network is conceptually divided into:

Body (backbone): All convolutional layers from the pretrained model. These encode the feature extraction pipeline.
Head: One or more fully connected layers that map the body's feature vector to the target class probabilities.

The head for a new task is typically:

AdaptiveAvgPool2d -> Flatten -> BatchNorm1d -> Dropout -> Linear -> ReLU -> BatchNorm1d -> Dropout -> Linear(num_classes)

Why Freezing Works

When the body is frozen (gradients not computed for body parameters), only the randomly initialized head is updated. Because the body already produces meaningful feature vectors, the head can learn a good linear decision boundary in just one or two epochs. This prevents the large random gradients from the untrained head from corrupting the carefully learned body weights during the initial training phase.

Common Pretrained Architectures

Architecture	Parameters	Top-1 Accuracy (ImageNet)	Recommended Use
ResNet18	11.7M	69.8%	Quick experiments, small datasets
ResNet34	21.8M	73.3%	Default starting point in fastai
ResNet50	25.6M	76.1%	Better accuracy when GPU memory allows
ResNet101	44.5M	77.4%	Large datasets, fine-grained tasks

Related Pages

Implemented By

Implementation:Fastai_Fastbook_Cnn_Learner

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment