Heuristic:Sktime Pytorch forecasting Learning Rate Scheduling

Knowledge Sources	pytorch-forecasting Getting Started FAQ
Domains	Optimization, Deep_Learning
Last Updated	2026-02-08 08:00 GMT

Overview

Override the default `reduce_on_plateau_patience` from 1000 to 3-4, and always use `Tuner.lr_find()` with `early_stop_threshold=1000.0` to find the optimal starting learning rate.

Description

The base model default for `reduce_on_plateau_patience` is 1000, which effectively disables learning rate scheduling during typical training runs (< 100 epochs). Every example and the README override this to 3 or 4. The library provides an LR range test wrapper via `Tuner.lr_find()`, which should be used before training and its suggestion visually confirmed. The FAQ notes three common pitfalls: using `fast_dev_run=True`, missing target normalizer, and `early_stop_threshold` being too low.

Usage

Apply this heuristic before every training run. First, run the LR finder to identify the optimal learning rate. Then, set `reduce_on_plateau_patience=3` or `4` in the model constructor. Without overriding the default patience, the LR scheduler will never trigger.

The Insight (Rule of Thumb)

Action 1: Run `Tuner(trainer).lr_find(model, ...)` with `early_stop_threshold=1000.0` before training.
Action 2: Visually confirm the suggested learning rate makes sense (it should be in the steepest descent region of the loss curve).
Action 3: Set `reduce_on_plateau_patience=3` or `4` in the model constructor (NOT the default 1000).
Value: Starting LR varies by model: 1e-3 (TFT default), 1e-2 (N-BEATS default), 0.1 (DeepAR example). Patience: 3-4.
Trade-off: Patience too low (1-2) may reduce LR prematurely during noisy early epochs. Patience too high (>10) wastes training time on a suboptimal LR.

Reasoning

The default `reduce_on_plateau_patience=1000` was set high intentionally to act as "disabled by default," letting users opt in to LR scheduling. In practice, every real training run needs it set to 3-4 to adapt the learning rate as the model converges. The LR range test prevents picking a learning rate that is too high (causes divergence) or too low (causes slow convergence).

Default patience from `models/base/_base_model.py:482`:

reduce_on_plateau_patience: int = 1000,  # effectively disabled

Practical overrides:

`examples/stallion.py:143` — `reduce_on_plateau_patience=3`
`examples/stallion.py:185` — `reduce_on_plateau_patience=4`
`README.md:152` — `reduce_on_plateau_patience=4`
`docs/source/getting-started.rst:133` — `reduce_on_plateau_patience=4`

LR finder usage from `README.md:156-158`:

res = Tuner(trainer).lr_find(
    tft, train_dataloaders=train_dataloader,
    val_dataloaders=val_dataloader,
    early_stop_threshold=1000.0,
    max_lr=10.0,
)

FAQ troubleshooting (docs/source/faq.rst:62-67): Three common reasons the LR finder does not finish:

`fast_dev_run=True` is set on the Trainer
No target normalizer in the training dataset
`early_stop_threshold` is too low (increase to 1000.0)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment