Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Fastai Fastbook Fine Tuning

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Transfer_Learning, Optimization, Computer_Vision
Last Updated 2026-02-09 17:00 GMT

Overview

Fine-tuning is a two-phase transfer learning training strategy that first trains only the newly added classification head with the pretrained body frozen, then unfreezes the body and trains all parameters together with discriminative learning rates.

Description

After creating a Learner with a pretrained backbone and a fresh classification head, the model must be trained on the target dataset. Naive approaches -- either training everything from the start or only ever training the head -- are suboptimal:

  • Training everything immediately risks corrupting the pretrained body weights with the large, random gradients coming from the untrained head.
  • Only training the head limits the model's ability to adapt its feature representations to the target domain.

Fine-tuning solves both problems with a two-phase approach:

  1. Phase 1 (Frozen body): Train only the head for one or more epochs. This allows the head to learn a reasonable mapping from the pretrained features to the target classes. The body weights are not updated.
  2. Phase 2 (Unfrozen body): Unfreeze the entire network and train all parameters for the remaining epochs. Crucially, earlier layers (which encode more general features) receive a lower learning rate than later layers (which encode more task-specific features). This is called discriminative learning rates.

Usage

Fine-tuning is the default training strategy in fastai and should be used for virtually all transfer learning tasks. The method handles the freeze/unfreeze cycle and discriminative learning rate scheduling automatically.

For more control, practitioners can manually call fit_one_cycle (for the frozen phase), unfreeze, and fit_one_cycle again (for the unfrozen phase) with custom learning rate slices.

Theoretical Basis

Discriminative Learning Rates

When the full model is unfrozen, different layer groups receive different learning rates. The intuition is:

Layer Group Content Learning Rate Rationale
Early body layers Edges, textures Very low (e.g., 1e-6) These features are universal; minimal adaptation needed
Middle body layers Patterns, parts Low (e.g., 1e-5) Some domain adaptation may help
Late body layers Semantic features Moderate (e.g., 1e-4) These features are more task-specific and benefit from adaptation
Head layers Classification High (e.g., 1e-3) These are randomly initialized and need the most learning

In fastai, discriminative learning rates are specified using a Python slice:

lr_max = slice(lower_bound, upper_bound)

The framework divides the model into parameter groups and distributes learning rates exponentially between the lower and upper bounds.

One-Cycle Training Policy

Both phases of fine-tuning use the 1cycle policy (Smith & Topin, 2018):

  1. Warmup: Learning rate increases from a low value to the maximum over the first ~30% of training.
  2. Annealing: Learning rate decreases from the maximum back to a very low value (1/25th of the max) over the remaining ~70%.
  3. Momentum mirror: Momentum decreases during warmup (to compensate for the increasing LR) and increases during annealing.

This schedule allows the optimizer to explore broadly during warmup (high LR, low momentum) and converge precisely during annealing (low LR, high momentum), often achieving better final accuracy in fewer epochs than constant-LR training.

The Freeze/Unfreeze Protocol

The standard fine-tuning protocol is:

1. Create learner (body frozen by default)
2. Train head for freeze_epochs (default: 1) at base_lr
3. Unfreeze all layers
4. Train full model for epochs at discriminative learning rates:
     - Body: base_lr / factor
     - Head: base_lr

The division factor is typically large (e.g., the base_lr is divided across layer groups), ensuring the pretrained body layers change very slowly relative to the head.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment