Heuristic:Sktime Pytorch forecasting Encoder Decoder Length Limits
| Knowledge Sources | |
|---|---|
| Domains | Time_Series, Optimization, Deep_Learning |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
Limit encoder length to 500 and decoder (prediction) length to 200 to prevent excessive training time and memory usage, and for N-BEATS set context_length between 1-10x prediction_length.
Description
The FAQ documentation provides hard upper-bound recommendations for encoder and decoder lengths. Exceeding these causes training to become very slow and may trigger memory errors. Additionally, series that are too short for the configured minimum encoder/prediction lengths are silently dropped from the dataset with a warning. The `allow_missing_timesteps=True` option makes dataset creation significantly slower because it must identify all gaps in the data.
Usage
Apply this heuristic when constructing a TimeSeriesDataSet and choosing `max_encoder_length`, `max_prediction_length`, and `min_encoder_length`. If some series have fewer time points than the minimum lengths, they will be silently excluded. If the model appears frozen during training, the encoder/decoder lengths may be too large for the available memory.
The Insight (Rule of Thumb)
- Action 1: Set `max_encoder_length` <= 500.
- Action 2: Set `max_prediction_length` <= 200.
- Action 3: For N-BEATS, set `context_length` between 1x and 10x the `prediction_length`.
- Action 4: If `allow_missing_timesteps=True`, expect slower dataset creation.
- Action 5: Check warnings for dropped series — if many groups are missing, reduce minimum lengths.
- Trade-off: Longer encoders capture more history but increase memory usage quadratically (for attention-based models) or linearly (for RNN-based models). Shorter encoders train faster but may miss long-term patterns.
Reasoning
Transformer models (TFT, TimeXer) have O(n^2) attention complexity in encoder length. RNN models (DeepAR) have O(n) but suffer from vanishing gradients with very long sequences. The 500/200 limits represent practical bounds where training time becomes unacceptable on typical hardware. For N-BEATS, the context_length/prediction_length ratio controls the lookback-to-forecast ratio; values outside 1-10x waste computation (too much lookback) or lose important patterns (too little lookback).
FAQ recommendation from `docs/source/faq.rst:26-28`:
Choose something reasonably long, but not much longer than 500 for the encoder length and 200 for the decoder length. Consider that longer lengths increase the time it takes for your model to train.
Missing groups warning from `data/timeseries/_timeseries.py:1859-1868`:
warnings.warn(
"Min encoder length and/or min_prediction_idx and/or min "
"prediction length and/or lags are too large for "
f"{len(missing_groups)} series/groups which therefore are not present"
" in the dataset index. "
"This means no predictions can be made for those series. ...",
UserWarning,
)
N-BEATS context_length guidance from `models/nbeats/_nbeats.py:59`:
context_length (int): Number of time units that condition the predictions. Also known as 'lookback period'. Should be between 1 and 10 times the prediction_length.
FAQ on missing timesteps from `docs/source/faq.rst:33-38`:
`allow_missing_timesteps=True` makes dataset creation much slower because it must identify all gaps. If possible, pre-fill missing timesteps yourself.