Heuristic:Fastai Fastbook Learning Rate Finder Rule

Knowledge Sources	fastai/fastbook Leslie Smith LR Finder
Domains	Optimization, Deep_Learning
Last Updated	2026-02-09 17:00 GMT

Overview

Two rules for selecting learning rate from `lr_find()`: use minimum/10 or the steepest descent point.

Description

The Learning Rate Finder (`lr_find`) sweeps learning rates from very small to very large while recording loss. The resulting plot shows loss vs learning rate. From this plot, two heuristic rules identify a good starting learning rate:

Minimum/10 rule: Take the learning rate where loss reaches its minimum, then divide by 10 (one order of magnitude less). This provides a safe learning rate that avoids the region where loss starts increasing again.
Steepest point rule: Take the learning rate at the point of steepest loss decrease. This is where the model is learning most efficiently.

Both rules typically produce similar values. The fastai library computes both suggestions automatically via `suggest_funcs=(minimum, steep)`.

Usage

Use this heuristic before every training run to select an appropriate learning rate. Apply it after:

Creating a new `Learner` with `cnn_learner`, `language_model_learner`, or `tabular_learner`
After unfreezing a pretrained model for further fine-tuning
When switching to a new dataset or architecture

The Insight (Rule of Thumb)

Action: Call `learn.lr_find()` before training, then select the learning rate.
Value: Use `lr_min / 10` (minimum loss point divided by 10) or `lr_steep` (steepest descent point). Typical values range from 1e-3 to 1e-2 for vision models.
Trade-off: Too high a learning rate causes divergence; too low wastes training time without significant improvement.
Default: fastai's default learning rate is `1e-3` if none is specified.

Reasoning

The Learning Rate Finder was discovered by Leslie N. Smith in 2015, despite neural networks existing since the 1950s. Before this technique, finding a good learning rate was considered one of the most challenging and important problems practitioners faced. The insight is that by gradually increasing the learning rate during a short training run, you can directly observe the relationship between learning rate and loss, avoiding the need for expensive trial-and-error hyperparameter searches.

The minimum/10 rule works because the actual minimum of the LR-vs-loss curve is already on the edge of instability; dividing by 10 provides a safety margin. The steepest point rule selects where learning is most efficient, also providing a margin before divergence. The LR finder plot uses a logarithmic scale, so the midpoint between 1e-3 and 1e-2 is approximately 3e-3, not 5.5e-3.

Code Evidence

LR Finder usage from `05_pet_breeds.md:627-640`:

learn = cnn_learner(dls, resnet34, metrics=error_rate)
lr_min,lr_steep = learn.lr_find()

print(f"Minimum/10: {lr_min:.2e}, steepest point: {lr_steep:.2e}")
# Output: Minimum/10: 8.32e-03, steepest point: 6.31e-03

Applying the selected learning rate from `05_pet_breeds.md:647-648`:

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(2, base_lr=3e-3)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment