Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Sktime Pytorch forecasting Time Series Dataset Construction

From Leeroopedia


Knowledge Sources
Domains Time_Series, Data_Engineering, Deep_Learning
Last Updated 2026-02-08 07:00 GMT

Overview

Technique for converting tabular time series DataFrames into structured encoder-decoder tensor datasets suitable for neural forecasting models.

Description

Time Series Dataset Construction bridges raw tabular data and neural network consumption. It takes a pandas DataFrame and converts it into a PyTorch Dataset that yields encoder (history) and decoder (future) tensor pairs. The construction process involves: (1) validating the DataFrame schema against declared variable types, (2) fitting or applying normalizers to continuous variables, (3) encoding categorical variables as integers, (4) computing valid sample indices based on encoder/decoder window lengths, and (5) storing metadata about variable types that models use to configure their architectures. This is the central abstraction in pytorch-forecasting's v1 data pipeline and the most critical step between data loading and model training.

Usage

Use this principle whenever preparing time series data for any pytorch-forecasting model. Every model in the library expects a TimeSeriesDataSet-derived DataLoader. The dataset construction step requires careful specification of: which columns are targets, which are covariates (and their temporal nature — static, known future, unknown), encoder/decoder lengths, and normalization strategy. Getting this configuration right is essential for model performance.

Theoretical Basis

The encoder-decoder paradigm for time series forecasting divides input into two windows:

Encoder:[tLe,,t1]Decoder:[t,,t+Ld1]

Where Le is the encoder (lookback) length and Ld is the decoder (prediction horizon) length.

Variable taxonomy:

  • Static categoricals — constant per series (e.g., product category)
  • Static reals — constant continuous features per series
  • Time-varying known categoricals — future-known categorical features (e.g., day of week)
  • Time-varying known reals — future-known continuous features (e.g., planned promotions)
  • Time-varying unknown reals — only observed in the past (e.g., the target itself)

Normalization: Each target variable is normalized per group (if using GroupNormalizer) or globally. The normalizer parameters (center, scale) are stored as additional features (target_scale) to allow models to denormalize predictions.

Pseudo-code:

# Abstract dataset construction
dataset = TimeSeriesDataset(
    data=dataframe,
    time_idx="time_idx",
    target="volume",
    group_ids=["agency", "sku"],
    encoder_length=30,
    decoder_length=6,
    covariates=classify_columns(dataframe),
    normalizer=fit_normalizer(dataframe, target)
)
# Each sample yields: (encoder_tensors, decoder_tensors, target, weight)

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment