Principle:Fastai Fastbook Learning Rate Selection

Knowledge Sources	Deep Learning for Coders with fastai & PyTorch Cyclical Learning Rates for Training Neural Networks fastai: A Layered API for Deep Learning
Domains	Deep_Learning, Optimization, Computer_Vision
Last Updated	2026-02-09 17:00 GMT

Overview

Learning rate selection is the process of empirically determining the optimal step size for gradient descent before committing to a full training run.

Description

The learning rate is the single most important hyperparameter in neural network training. It controls how much the model weights are adjusted in response to the computed gradient at each optimization step:

weight_new = weight_old - learning_rate * gradient

If the learning rate is too high, the optimizer overshoots the loss minimum and training diverges (loss explodes). If too low, training converges extremely slowly or gets stuck in a poor local minimum. Finding the right value is critical and was historically done by trial and error.

The learning rate finder technique, proposed by Leslie Smith (2015), automates this search. It runs a short mock training session where the learning rate increases exponentially from a very small value to a very large value over a fixed number of iterations. The loss is recorded at each step. The resulting loss-vs-learning-rate plot reveals the optimal range.

Usage

Run the learning rate finder once after creating a Learner and before calling any training method. It is especially important after:

Creating a new Learner for the first time
Unfreezing the model body (the optimal rate changes when more parameters are trainable)
Changing the dataset significantly (different data distribution may shift the optimal rate)

Theoretical Basis

The LR Finder Algorithm

The learning rate finder follows this procedure:

Set learning rate to a very small value (e.g., 1e-7).
Train for one mini-batch and record the loss.
Multiply the learning rate by a constant factor (e.g., 1.3).
Repeat steps 2-3 for a fixed number of iterations (e.g., 100).
Plot loss vs. log(learning_rate).

Interpreting the Plot

The loss-vs-learning-rate curve has a characteristic shape:

Region	Learning Rate Range	Loss Behavior	Interpretation
Too low	< 1e-4 (typical)	Flat or very slowly decreasing	Learning is too slow; gradients barely move the weights
Sweet spot	~1e-3 to ~1e-2 (typical)	Steeply decreasing	Optimal range; fast convergence without instability
Too high	> 1e-1 (typical)	Sharply increasing or diverging	Optimizer overshoots; weights oscillate wildly

Selection Heuristics

Two common heuristics for selecting the learning rate from the plot:

One order of magnitude before the minimum: Find the learning rate where the loss is lowest, then divide by 10. This provides a safety margin below the instability threshold.
Steepest descent: Find the learning rate at the point of steepest negative slope. This maximizes the rate of loss decrease.

The fastai lr_find method returns both values as lr_min (the valley minimum divided by 10) and lr_steep (the point of maximum negative gradient).

Mathematical Basis

The exponential schedule used by the finder can be expressed as:

lr_i = start_lr * (end_lr / start_lr) ^ (i / num_iterations)

where i is the current iteration. This ensures uniform spacing on a logarithmic scale, which is appropriate because the optimal learning rate often varies over several orders of magnitude.

Related Pages

Implemented By

Implementation:Fastai_Fastbook_Lr_Find

Uses Heuristic

Heuristic:Fastai_Fastbook_Learning_Rate_Finder_Rule

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment