Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Sktime Pytorch forecasting EncoderDecoderTimeSeriesDataModule

From Leeroopedia


Knowledge Sources
Domains Time_Series, Forecasting, Deep_Learning
Last Updated 2026-02-08 08:00 GMT

Overview

EncoderDecoderTimeSeriesDataModule is a Lightning DataModule for processing time series data in an encoder-decoder format with support for variable-length sequences.

Description

EncoderDecoderTimeSeriesDataModule extends LightningDataModule and handles preprocessing, splitting, and batching of time series data for encoder-decoder deep learning models. It takes a TimeSeries dataset (D1 layer) and produces DataLoaders with sliding-window batches that separate encoder (historical) and decoder (future) components. The module splits features into categorical and continuous subsets, computes known-feature masks for the decoder, handles static features, and computes metadata for model initialization including encoder/decoder feature counts, target counts, and sequence lengths. An inner _ProcessedEncoderDecoderDataset class produces (x, y) tuples where x contains encoder_cat, encoder_cont, decoder_cat, decoder_cont, masks, time indices, and target scales.

Usage

Use EncoderDecoderTimeSeriesDataModule for models that consume separate encoder and decoder inputs (e.g., seq2seq architectures, Temporal Fusion Transformer). It is part of the experimental v2 data pipeline and supports configurable encoder/decoder lengths, target normalization, and train/val/test splitting.

Code Reference

Source Location

Signature

class EncoderDecoderTimeSeriesDataModule(LightningDataModule):
    def __init__(
        self,
        time_series_dataset: TimeSeries,
        max_encoder_length: int = 30,
        min_encoder_length: int | None = None,
        max_prediction_length: int = 1,
        min_prediction_length: int | None = None,
        min_prediction_idx: int | None = None,
        allow_missing_timesteps: bool = False,
        add_relative_time_idx: bool = False,
        add_target_scales: bool = False,
        add_encoder_length: bool | str = "auto",
        target_normalizer: NORMALIZER
        | str
        | list[NORMALIZER]
        | tuple[NORMALIZER]
        | None = "auto",
        categorical_encoders: dict[str, NaNLabelEncoder] | None = None,
        scalers: dict[
            str, StandardScaler | RobustScaler | TorchNormalizer | EncoderNormalizer
        ]
        | None = None,
        randomize_length: None | tuple[float, float] | bool = False,
        batch_size: int = 32,
        num_workers: int = 0,
        train_val_test_split: tuple = (0.7, 0.15, 0.15),
    ):

setup

def setup(self, stage: str | None = None):

Import

from pytorch_forecasting.data.data_module import EncoderDecoderTimeSeriesDataModule

I/O Contract

Constructor Inputs

Name Type Required Description
time_series_dataset TimeSeries Yes The time series dataset (D1 layer)
max_encoder_length int No Maximum encoder input sequence length (default 30)
min_encoder_length int or None No Minimum encoder length; defaults to max_encoder_length
max_prediction_length int No Maximum decoder output sequence length (default 1)
min_prediction_length int or None No Minimum prediction length; defaults to max_prediction_length
min_prediction_idx int or None No Minimum index from which predictions start
allow_missing_timesteps bool No Whether to allow missing timesteps (default False)
add_relative_time_idx bool No Whether to add relative time index feature (default False)
add_target_scales bool No Whether to add target scaling information (default False)
add_encoder_length bool or str No Whether to include encoder length info (default 'auto')
target_normalizer NORMALIZER or str or None No Target normalizer; 'auto' uses RobustScaler (default 'auto')
categorical_encoders dict or None No Dictionary of categorical encoders
scalers dict or None No Dictionary of feature scalers
randomize_length None or tuple or bool No Whether to randomize input sequence length (default False)
batch_size int No Batch size (default 32)
num_workers int No Number of dataloader workers (default 0)
train_val_test_split tuple No Train/val/test proportions (default (0.7, 0.15, 0.15))

Batch Output (x dict)

Name Type Description
encoder_cat torch.Tensor Categorical features for encoder, shape (batch, enc_length, n_cat)
encoder_cont torch.Tensor Continuous features for encoder, shape (batch, enc_length, n_cont)
decoder_cat torch.Tensor Known categorical features for decoder, shape (batch, pred_length, n_known_cat)
decoder_cont torch.Tensor Known continuous features for decoder, shape (batch, pred_length, n_known_cont)
encoder_lengths torch.Tensor Encoder sequence lengths
decoder_lengths torch.Tensor Decoder sequence lengths
decoder_target_lengths torch.Tensor Decoder target sequence lengths
groups torch.Tensor Group identifiers
target_past torch.Tensor Historical target values for encoder
encoder_time_idx torch.Tensor Time indices for encoder
decoder_time_idx torch.Tensor Time indices for decoder
target_scale torch.Tensor Scaling factor for target values
encoder_mask torch.Tensor Boolean mask for valid encoder time points
decoder_mask torch.Tensor Boolean mask for valid decoder time points

metadata Property Output

Name Type Description
encoder_cat int Number of categorical variables in the encoder
encoder_cont int Number of continuous variables in the encoder
decoder_cat int Number of known categorical variables in the decoder
decoder_cont int Number of known continuous variables in the decoder
target int Number of target variables
static_categorical_features int Number of static categorical features
static_continuous_features int Number of static continuous features
max_encoder_length int Maximum encoder length
max_prediction_length int Maximum prediction length
min_encoder_length int Minimum encoder length
min_prediction_length int Minimum prediction length

Usage Examples

from pytorch_forecasting.data.data_module import EncoderDecoderTimeSeriesDataModule
from pytorch_forecasting.data.timeseries import TimeSeries

# Create TimeSeries dataset
ts = TimeSeries(data=df, time="time_idx", target="value", group=["series_id"])

# Create DataModule
dm = EncoderDecoderTimeSeriesDataModule(
    time_series_dataset=ts,
    max_encoder_length=96,
    max_prediction_length=24,
    batch_size=64,
    num_workers=4,
)

dm.setup(stage="fit")
train_loader = dm.train_dataloader()
val_loader = dm.val_dataloader()

# Access metadata for model initialization
metadata = dm.metadata
print(metadata["encoder_cont"])  # Number of continuous encoder features

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment