Implementation:Sktime Pytorch forecasting EncoderDecoderTimeSeriesDataModule

Knowledge Sources	Sktime_Pytorch_forecasting
Domains	Time_Series, Forecasting, Deep_Learning
Last Updated	2026-02-08 08:00 GMT

Overview

EncoderDecoderTimeSeriesDataModule is a Lightning DataModule for processing time series data in an encoder-decoder format with support for variable-length sequences.

Description

EncoderDecoderTimeSeriesDataModule extends LightningDataModule and handles preprocessing, splitting, and batching of time series data for encoder-decoder deep learning models. It takes a TimeSeries dataset (D1 layer) and produces DataLoaders with sliding-window batches that separate encoder (historical) and decoder (future) components. The module splits features into categorical and continuous subsets, computes known-feature masks for the decoder, handles static features, and computes metadata for model initialization including encoder/decoder feature counts, target counts, and sequence lengths. An inner _ProcessedEncoderDecoderDataset class produces (x, y) tuples where x contains encoder_cat, encoder_cont, decoder_cat, decoder_cont, masks, time indices, and target scales.

Usage

Use EncoderDecoderTimeSeriesDataModule for models that consume separate encoder and decoder inputs (e.g., seq2seq architectures, Temporal Fusion Transformer). It is part of the experimental v2 data pipeline and supports configurable encoder/decoder lengths, target normalization, and train/val/test splitting.

Code Reference

Source Location

Repository: Sktime_Pytorch_forecasting
File: pytorch_forecasting/data/data_module.py
Lines: 1-758

Signature

class EncoderDecoderTimeSeriesDataModule(LightningDataModule):
    def __init__(
        self,
        time_series_dataset: TimeSeries,
        max_encoder_length: int = 30,
        min_encoder_length: int | None = None,
        max_prediction_length: int = 1,
        min_prediction_length: int | None = None,
        min_prediction_idx: int | None = None,
        allow_missing_timesteps: bool = False,
        add_relative_time_idx: bool = False,
        add_target_scales: bool = False,
        add_encoder_length: bool | str = "auto",
        target_normalizer: NORMALIZER
        | str
        | list[NORMALIZER]
        | tuple[NORMALIZER]
        | None = "auto",
        categorical_encoders: dict[str, NaNLabelEncoder] | None = None,
        scalers: dict[
            str, StandardScaler | RobustScaler | TorchNormalizer | EncoderNormalizer
        ]
        | None = None,
        randomize_length: None | tuple[float, float] | bool = False,
        batch_size: int = 32,
        num_workers: int = 0,
        train_val_test_split: tuple = (0.7, 0.15, 0.15),
    ):

setup

def setup(self, stage: str | None = None):

Import

from pytorch_forecasting.data.data_module import EncoderDecoderTimeSeriesDataModule

I/O Contract

Constructor Inputs

Name	Type	Required	Description
time_series_dataset	TimeSeries	Yes	The time series dataset (D1 layer)
max_encoder_length	int	No	Maximum encoder input sequence length (default 30)
min_encoder_length	int or None	No	Minimum encoder length; defaults to max_encoder_length
max_prediction_length	int	No	Maximum decoder output sequence length (default 1)
min_prediction_length	int or None	No	Minimum prediction length; defaults to max_prediction_length
min_prediction_idx	int or None	No	Minimum index from which predictions start
allow_missing_timesteps	bool	No	Whether to allow missing timesteps (default False)
add_relative_time_idx	bool	No	Whether to add relative time index feature (default False)
add_target_scales	bool	No	Whether to add target scaling information (default False)
add_encoder_length	bool or str	No	Whether to include encoder length info (default 'auto')
target_normalizer	NORMALIZER or str or None	No	Target normalizer; 'auto' uses RobustScaler (default 'auto')
categorical_encoders	dict or None	No	Dictionary of categorical encoders
scalers	dict or None	No	Dictionary of feature scalers
randomize_length	None or tuple or bool	No	Whether to randomize input sequence length (default False)
batch_size	int	No	Batch size (default 32)
num_workers	int	No	Number of dataloader workers (default 0)
train_val_test_split	tuple	No	Train/val/test proportions (default (0.7, 0.15, 0.15))

Batch Output (x dict)

Name	Type	Description
encoder_cat	torch.Tensor	Categorical features for encoder, shape (batch, enc_length, n_cat)
encoder_cont	torch.Tensor	Continuous features for encoder, shape (batch, enc_length, n_cont)
decoder_cat	torch.Tensor	Known categorical features for decoder, shape (batch, pred_length, n_known_cat)
decoder_cont	torch.Tensor	Known continuous features for decoder, shape (batch, pred_length, n_known_cont)
encoder_lengths	torch.Tensor	Encoder sequence lengths
decoder_lengths	torch.Tensor	Decoder sequence lengths
decoder_target_lengths	torch.Tensor	Decoder target sequence lengths
groups	torch.Tensor	Group identifiers
target_past	torch.Tensor	Historical target values for encoder
encoder_time_idx	torch.Tensor	Time indices for encoder
decoder_time_idx	torch.Tensor	Time indices for decoder
target_scale	torch.Tensor	Scaling factor for target values
encoder_mask	torch.Tensor	Boolean mask for valid encoder time points
decoder_mask	torch.Tensor	Boolean mask for valid decoder time points

metadata Property Output

Name	Type	Description
encoder_cat	int	Number of categorical variables in the encoder
encoder_cont	int	Number of continuous variables in the encoder
decoder_cat	int	Number of known categorical variables in the decoder
decoder_cont	int	Number of known continuous variables in the decoder
target	int	Number of target variables
static_categorical_features	int	Number of static categorical features
static_continuous_features	int	Number of static continuous features
max_encoder_length	int	Maximum encoder length
max_prediction_length	int	Maximum prediction length
min_encoder_length	int	Minimum encoder length
min_prediction_length	int	Minimum prediction length

Usage Examples

from pytorch_forecasting.data.data_module import EncoderDecoderTimeSeriesDataModule
from pytorch_forecasting.data.timeseries import TimeSeries

# Create TimeSeries dataset
ts = TimeSeries(data=df, time="time_idx", target="value", group=["series_id"])

# Create DataModule
dm = EncoderDecoderTimeSeriesDataModule(
    time_series_dataset=ts,
    max_encoder_length=96,
    max_prediction_length=24,
    batch_size=64,
    num_workers=4,
)

dm.setup(stage="fit")
train_loader = dm.train_dataloader()
val_loader = dm.val_dataloader()

# Access metadata for model initialization
metadata = dm.metadata
print(metadata["encoder_cont"])  # Number of continuous encoder features

Related Pages

Principle:Sktime_Pytorch_forecasting_V2_Data_Pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment