Heuristic:Sktime Pytorch forecasting Early Stopping Patience
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Deep_Learning, Time_Series |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
Use EarlyStopping patience of 10 for full training runs (5 for DeepAR), with min_delta=1e-4 and mode="min" on validation loss.
Description
EarlyStopping is a Lightning callback that stops training when the monitored metric has not improved for a specified number of epochs (patience). Every production example in pytorch-forecasting uses patience=10 (or 5 for DeepAR, which converges faster). The callback monitors validation loss with `min_delta=1e-4`, meaning improvements smaller than 0.0001 are not counted. Test files use patience=1 for fast iteration.
Usage
Apply this heuristic when configuring the Lightning Trainer callbacks. Use patience=10 for TFT and N-BEATS, patience=5 for DeepAR. Always set `mode="min"` and `min_delta=1e-4`. If training stops too early, increase patience. If training runs too long, decrease patience or add ReduceLROnPlateau with patience=3-4.
The Insight (Rule of Thumb)
- Action: Add `EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, mode="min")` to Trainer callbacks.
- Value: patience=10 (standard), patience=5 (DeepAR), patience=1 (tests only).
- Trade-off: Patience too low (< 5) risks stopping before the model converges through a loss plateau. Patience too high (> 20) wastes compute on a converged model. The combination of EarlyStopping(patience=10) + ReduceLROnPlateau(patience=3-4) is the recommended dual strategy.
Reasoning
Deep learning loss landscapes have plateaus where the loss appears flat before a sudden improvement. A patience of 10 epochs allows the model to pass through these plateaus. DeepAR's autoregressive nature means it converges faster than encoder-decoder attention models like TFT, justifying lower patience. The min_delta of 1e-4 filters out noise-level improvements that do not represent genuine convergence progress.
Production examples:
- `examples/stallion.py:113` — `EarlyStopping(..., patience=10, min_delta=1e-4)`
- `examples/nbeats.py:58` — `EarlyStopping(..., patience=10, min_delta=1e-4)`
- `examples/ar.py:66` — `EarlyStopping(..., patience=5)`
Code from `examples/stallion.py:111-116`:
early_stop_callback = EarlyStopping(
monitor="val_loss",
min_delta=1e-4,
patience=10,
verbose=False,
mode="min",
)
Non-finite loss safety from `metrics/base_metrics/_base_metrics.py:906`:
if not torch.isfinite(losses):
losses = torch.tensor(1e9, device=losses.device)
warnings.warn("Loss is not finite. Resetting it to 1e9")
This safety valve replaces NaN/Inf losses with 1e9, allowing EarlyStopping to detect the problem rather than crashing. If you see this warning, the model has numerical instability that needs to be addressed.