Principle:Sktime Pytorch forecasting Time Series Dataset Construction
| Knowledge Sources | |
|---|---|
| Domains | Time_Series, Data_Engineering, Deep_Learning |
| Last Updated | 2026-02-08 07:00 GMT |
Overview
Technique for converting tabular time series DataFrames into structured encoder-decoder tensor datasets suitable for neural forecasting models.
Description
Time Series Dataset Construction bridges raw tabular data and neural network consumption. It takes a pandas DataFrame and converts it into a PyTorch Dataset that yields encoder (history) and decoder (future) tensor pairs. The construction process involves: (1) validating the DataFrame schema against declared variable types, (2) fitting or applying normalizers to continuous variables, (3) encoding categorical variables as integers, (4) computing valid sample indices based on encoder/decoder window lengths, and (5) storing metadata about variable types that models use to configure their architectures. This is the central abstraction in pytorch-forecasting's v1 data pipeline and the most critical step between data loading and model training.
Usage
Use this principle whenever preparing time series data for any pytorch-forecasting model. Every model in the library expects a TimeSeriesDataSet-derived DataLoader. The dataset construction step requires careful specification of: which columns are targets, which are covariates (and their temporal nature — static, known future, unknown), encoder/decoder lengths, and normalization strategy. Getting this configuration right is essential for model performance.
Theoretical Basis
The encoder-decoder paradigm for time series forecasting divides input into two windows:
Where is the encoder (lookback) length and is the decoder (prediction horizon) length.
Variable taxonomy:
- Static categoricals — constant per series (e.g., product category)
- Static reals — constant continuous features per series
- Time-varying known categoricals — future-known categorical features (e.g., day of week)
- Time-varying known reals — future-known continuous features (e.g., planned promotions)
- Time-varying unknown reals — only observed in the past (e.g., the target itself)
Normalization: Each target variable is normalized per group (if using GroupNormalizer) or globally. The normalizer parameters (center, scale) are stored as additional features (target_scale) to allow models to denormalize predictions.
Pseudo-code:
# Abstract dataset construction
dataset = TimeSeriesDataset(
data=dataframe,
time_idx="time_idx",
target="volume",
group_ids=["agency", "sku"],
encoder_length=30,
decoder_length=6,
covariates=classify_columns(dataframe),
normalizer=fit_normalizer(dataframe, target)
)
# Each sample yields: (encoder_tensors, decoder_tensors, target, weight)