Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Sktime Pytorch forecasting Learning Rate Finding

From Leeroopedia


Knowledge Sources
Domains Deep_Learning, Optimization, Hyperparameter_Tuning
Last Updated 2026-02-08 07:00 GMT

Overview

Technique for automatically determining an optimal initial learning rate by sweeping through a range of learning rates and analyzing the loss curve.

Description

Learning Rate Finding (LR Range Test) trains the model for a short period while exponentially increasing the learning rate from a small value to a large value. By plotting loss vs. learning rate, the optimal learning rate is identified as the point where the loss decreases most steeply — typically one order of magnitude before the loss starts diverging. This eliminates manual learning rate tuning, which is one of the most impactful hyperparameters for deep learning convergence. In pytorch-forecasting, the Tuner wraps Lightning's LR finder with a compatibility fix for checkpoint loading.

Usage

Use this principle after configuring the Trainer and instantiating the model, but before calling Trainer.fit(). The found learning rate should be set on the model before training. This is used in the TFT Demand Forecasting and TFT Hyperparameter Optimization workflows.

Theoretical Basis

The LR Range Test (Smith, 2015) proceeds as follows:

Algorithm:

# Abstract LR finding algorithm
lr = min_lr
for batch in training_data:
    loss = train_one_step(model, batch, lr)
    record(lr, loss)
    lr *= growth_factor  # exponential increase
    if lr > max_lr or loss > divergence_threshold:
        break

optimal_lr = lr_at_steepest_descent(recorded_losses)

The growth factor is computed as: Failed to parse (syntax error): {\displaystyle \text{growth\_factor} = \left(\frac{\text{max\_lr}}{\text{min\_lr}}\right)^{1/N} }

Where N is the number of training steps.

Selection heuristic: Choose the learning rate approximately 10x smaller than the rate where loss is minimized, or at the steepest point of the loss-vs-lr curve.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment