Principle:Sktime Pytorch forecasting Learning Rate Finding
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Optimization, Hyperparameter_Tuning |
| Last Updated | 2026-02-08 07:00 GMT |
Overview
Technique for automatically determining an optimal initial learning rate by sweeping through a range of learning rates and analyzing the loss curve.
Description
Learning Rate Finding (LR Range Test) trains the model for a short period while exponentially increasing the learning rate from a small value to a large value. By plotting loss vs. learning rate, the optimal learning rate is identified as the point where the loss decreases most steeply — typically one order of magnitude before the loss starts diverging. This eliminates manual learning rate tuning, which is one of the most impactful hyperparameters for deep learning convergence. In pytorch-forecasting, the Tuner wraps Lightning's LR finder with a compatibility fix for checkpoint loading.
Usage
Use this principle after configuring the Trainer and instantiating the model, but before calling Trainer.fit(). The found learning rate should be set on the model before training. This is used in the TFT Demand Forecasting and TFT Hyperparameter Optimization workflows.
Theoretical Basis
The LR Range Test (Smith, 2015) proceeds as follows:
Algorithm:
# Abstract LR finding algorithm
lr = min_lr
for batch in training_data:
loss = train_one_step(model, batch, lr)
record(lr, loss)
lr *= growth_factor # exponential increase
if lr > max_lr or loss > divergence_threshold:
break
optimal_lr = lr_at_steepest_descent(recorded_losses)
The growth factor is computed as: Failed to parse (syntax error): {\displaystyle \text{growth\_factor} = \left(\frac{\text{max\_lr}}{\text{min\_lr}}\right)^{1/N} }
Where is the number of training steps.
Selection heuristic: Choose the learning rate approximately 10x smaller than the rate where loss is minimized, or at the steepest point of the loss-vs-lr curve.