Principle:Sktime Pytorch forecasting Time Series Dataset Construction

Knowledge Sources	pytorch-forecasting PyTorch Forecasting Docs Temporal Fusion Transformers
Domains	Time_Series, Data_Engineering, Deep_Learning
Last Updated	2026-02-08 07:00 GMT

Overview

Technique for converting tabular time series DataFrames into structured encoder-decoder tensor datasets suitable for neural forecasting models.

Description

Time Series Dataset Construction bridges raw tabular data and neural network consumption. It takes a pandas DataFrame and converts it into a PyTorch Dataset that yields encoder (history) and decoder (future) tensor pairs. The construction process involves: (1) validating the DataFrame schema against declared variable types, (2) fitting or applying normalizers to continuous variables, (3) encoding categorical variables as integers, (4) computing valid sample indices based on encoder/decoder window lengths, and (5) storing metadata about variable types that models use to configure their architectures. This is the central abstraction in pytorch-forecasting's v1 data pipeline and the most critical step between data loading and model training.

Usage

Use this principle whenever preparing time series data for any pytorch-forecasting model. Every model in the library expects a TimeSeriesDataSet-derived DataLoader. The dataset construction step requires careful specification of: which columns are targets, which are covariates (and their temporal nature — static, known future, unknown), encoder/decoder lengths, and normalization strategy. Getting this configuration right is essential for model performance.

Theoretical Basis

The encoder-decoder paradigm for time series forecasting divides input into two windows:

$Encoder : [t - L_{e}, \dots, t - 1] Decoder : [t, \dots, t + L_{d} - 1]$

Where $L_{e}$ is the encoder (lookback) length and $L_{d}$ is the decoder (prediction horizon) length.

Variable taxonomy:

Static categoricals — constant per series (e.g., product category)
Static reals — constant continuous features per series
Time-varying known categoricals — future-known categorical features (e.g., day of week)
Time-varying known reals — future-known continuous features (e.g., planned promotions)
Time-varying unknown reals — only observed in the past (e.g., the target itself)

Normalization: Each target variable is normalized per group (if using GroupNormalizer) or globally. The normalizer parameters (center, scale) are stored as additional features (target_scale) to allow models to denormalize predictions.

Pseudo-code:

# Abstract dataset construction
dataset = TimeSeriesDataset(
    data=dataframe,
    time_idx="time_idx",
    target="volume",
    group_ids=["agency", "sku"],
    encoder_length=30,
    decoder_length=6,
    covariates=classify_columns(dataframe),
    normalizer=fit_normalizer(dataframe, target)
)
# Each sample yields: (encoder_tensors, decoder_tensors, target, weight)

Related Pages

Implemented By

Implementation:Sktime_Pytorch_forecasting_TimeSeriesDataSet_Init

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment