Principle:Fastai Fastbook Fine Tuning
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Transfer_Learning, Optimization, Computer_Vision |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Fine-tuning is a two-phase transfer learning training strategy that first trains only the newly added classification head with the pretrained body frozen, then unfreezes the body and trains all parameters together with discriminative learning rates.
Description
After creating a Learner with a pretrained backbone and a fresh classification head, the model must be trained on the target dataset. Naive approaches -- either training everything from the start or only ever training the head -- are suboptimal:
- Training everything immediately risks corrupting the pretrained body weights with the large, random gradients coming from the untrained head.
- Only training the head limits the model's ability to adapt its feature representations to the target domain.
Fine-tuning solves both problems with a two-phase approach:
- Phase 1 (Frozen body): Train only the head for one or more epochs. This allows the head to learn a reasonable mapping from the pretrained features to the target classes. The body weights are not updated.
- Phase 2 (Unfrozen body): Unfreeze the entire network and train all parameters for the remaining epochs. Crucially, earlier layers (which encode more general features) receive a lower learning rate than later layers (which encode more task-specific features). This is called discriminative learning rates.
Usage
Fine-tuning is the default training strategy in fastai and should be used for virtually all transfer learning tasks. The method handles the freeze/unfreeze cycle and discriminative learning rate scheduling automatically.
For more control, practitioners can manually call fit_one_cycle (for the frozen phase), unfreeze, and fit_one_cycle again (for the unfrozen phase) with custom learning rate slices.
Theoretical Basis
Discriminative Learning Rates
When the full model is unfrozen, different layer groups receive different learning rates. The intuition is:
| Layer Group | Content | Learning Rate | Rationale |
|---|---|---|---|
| Early body layers | Edges, textures | Very low (e.g., 1e-6) | These features are universal; minimal adaptation needed |
| Middle body layers | Patterns, parts | Low (e.g., 1e-5) | Some domain adaptation may help |
| Late body layers | Semantic features | Moderate (e.g., 1e-4) | These features are more task-specific and benefit from adaptation |
| Head layers | Classification | High (e.g., 1e-3) | These are randomly initialized and need the most learning |
In fastai, discriminative learning rates are specified using a Python slice:
lr_max = slice(lower_bound, upper_bound)
The framework divides the model into parameter groups and distributes learning rates exponentially between the lower and upper bounds.
One-Cycle Training Policy
Both phases of fine-tuning use the 1cycle policy (Smith & Topin, 2018):
- Warmup: Learning rate increases from a low value to the maximum over the first ~30% of training.
- Annealing: Learning rate decreases from the maximum back to a very low value (1/25th of the max) over the remaining ~70%.
- Momentum mirror: Momentum decreases during warmup (to compensate for the increasing LR) and increases during annealing.
This schedule allows the optimizer to explore broadly during warmup (high LR, low momentum) and converge precisely during annealing (low LR, high momentum), often achieving better final accuracy in fewer epochs than constant-LR training.
The Freeze/Unfreeze Protocol
The standard fine-tuning protocol is:
1. Create learner (body frozen by default)
2. Train head for freeze_epochs (default: 1) at base_lr
3. Unfreeze all layers
4. Train full model for epochs at discriminative learning rates:
- Body: base_lr / factor
- Head: base_lr
The division factor is typically large (e.g., the base_lr is divided across layer groups), ensuring the pretrained body layers change very slowly relative to the head.