Heuristic:Sktime Pytorch forecasting Learning Rate Scheduling
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Deep_Learning |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
Override the default `reduce_on_plateau_patience` from 1000 to 3-4, and always use `Tuner.lr_find()` with `early_stop_threshold=1000.0` to find the optimal starting learning rate.
Description
The base model default for `reduce_on_plateau_patience` is 1000, which effectively disables learning rate scheduling during typical training runs (< 100 epochs). Every example and the README override this to 3 or 4. The library provides an LR range test wrapper via `Tuner.lr_find()`, which should be used before training and its suggestion visually confirmed. The FAQ notes three common pitfalls: using `fast_dev_run=True`, missing target normalizer, and `early_stop_threshold` being too low.
Usage
Apply this heuristic before every training run. First, run the LR finder to identify the optimal learning rate. Then, set `reduce_on_plateau_patience=3` or `4` in the model constructor. Without overriding the default patience, the LR scheduler will never trigger.
The Insight (Rule of Thumb)
- Action 1: Run `Tuner(trainer).lr_find(model, ...)` with `early_stop_threshold=1000.0` before training.
- Action 2: Visually confirm the suggested learning rate makes sense (it should be in the steepest descent region of the loss curve).
- Action 3: Set `reduce_on_plateau_patience=3` or `4` in the model constructor (NOT the default 1000).
- Value: Starting LR varies by model: 1e-3 (TFT default), 1e-2 (N-BEATS default), 0.1 (DeepAR example). Patience: 3-4.
- Trade-off: Patience too low (1-2) may reduce LR prematurely during noisy early epochs. Patience too high (>10) wastes training time on a suboptimal LR.
Reasoning
The default `reduce_on_plateau_patience=1000` was set high intentionally to act as "disabled by default," letting users opt in to LR scheduling. In practice, every real training run needs it set to 3-4 to adapt the learning rate as the model converges. The LR range test prevents picking a learning rate that is too high (causes divergence) or too low (causes slow convergence).
Default patience from `models/base/_base_model.py:482`:
reduce_on_plateau_patience: int = 1000, # effectively disabled
Practical overrides:
- `examples/stallion.py:143` — `reduce_on_plateau_patience=3`
- `examples/stallion.py:185` — `reduce_on_plateau_patience=4`
- `README.md:152` — `reduce_on_plateau_patience=4`
- `docs/source/getting-started.rst:133` — `reduce_on_plateau_patience=4`
LR finder usage from `README.md:156-158`:
res = Tuner(trainer).lr_find(
tft, train_dataloaders=train_dataloader,
val_dataloaders=val_dataloader,
early_stop_threshold=1000.0,
max_lr=10.0,
)
FAQ troubleshooting (docs/source/faq.rst:62-67): Three common reasons the LR finder does not finish:
- `fast_dev_run=True` is set on the Trainer
- No target normalizer in the training dataset
- `early_stop_threshold` is too low (increase to 1000.0)
Related Pages
- Implementation:Sktime_Pytorch_forecasting_Tuner_Lr_Find
- Implementation:Sktime_Pytorch_forecasting_TemporalFusionTransformer_From_Dataset
- Implementation:Sktime_Pytorch_forecasting_DeepAR_From_Dataset
- Implementation:Sktime_Pytorch_forecasting_NBeats_From_Dataset
- Principle:Sktime_Pytorch_forecasting_Learning_Rate_Finding