Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Sktime Pytorch forecasting Early Stopping Patience

From Leeroopedia




Knowledge Sources
Domains Optimization, Deep_Learning, Time_Series
Last Updated 2026-02-08 08:00 GMT

Overview

Use EarlyStopping patience of 10 for full training runs (5 for DeepAR), with min_delta=1e-4 and mode="min" on validation loss.

Description

EarlyStopping is a Lightning callback that stops training when the monitored metric has not improved for a specified number of epochs (patience). Every production example in pytorch-forecasting uses patience=10 (or 5 for DeepAR, which converges faster). The callback monitors validation loss with `min_delta=1e-4`, meaning improvements smaller than 0.0001 are not counted. Test files use patience=1 for fast iteration.

Usage

Apply this heuristic when configuring the Lightning Trainer callbacks. Use patience=10 for TFT and N-BEATS, patience=5 for DeepAR. Always set `mode="min"` and `min_delta=1e-4`. If training stops too early, increase patience. If training runs too long, decrease patience or add ReduceLROnPlateau with patience=3-4.

The Insight (Rule of Thumb)

  • Action: Add `EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, mode="min")` to Trainer callbacks.
  • Value: patience=10 (standard), patience=5 (DeepAR), patience=1 (tests only).
  • Trade-off: Patience too low (< 5) risks stopping before the model converges through a loss plateau. Patience too high (> 20) wastes compute on a converged model. The combination of EarlyStopping(patience=10) + ReduceLROnPlateau(patience=3-4) is the recommended dual strategy.

Reasoning

Deep learning loss landscapes have plateaus where the loss appears flat before a sudden improvement. A patience of 10 epochs allows the model to pass through these plateaus. DeepAR's autoregressive nature means it converges faster than encoder-decoder attention models like TFT, justifying lower patience. The min_delta of 1e-4 filters out noise-level improvements that do not represent genuine convergence progress.

Production examples:

  • `examples/stallion.py:113` — `EarlyStopping(..., patience=10, min_delta=1e-4)`
  • `examples/nbeats.py:58` — `EarlyStopping(..., patience=10, min_delta=1e-4)`
  • `examples/ar.py:66` — `EarlyStopping(..., patience=5)`

Code from `examples/stallion.py:111-116`:

early_stop_callback = EarlyStopping(
    monitor="val_loss",
    min_delta=1e-4,
    patience=10,
    verbose=False,
    mode="min",
)

Non-finite loss safety from `metrics/base_metrics/_base_metrics.py:906`:

if not torch.isfinite(losses):
    losses = torch.tensor(1e9, device=losses.device)
    warnings.warn("Loss is not finite. Resetting it to 1e9")

This safety valve replaces NaN/Inf losses with 1e9, allowing EarlyStopping to detect the problem rather than crashing. If you see this warning, the model has numerical instability that needs to be addressed.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment