Implementation:Sktime Pytorch forecasting TslibDataModule
| Knowledge Sources | |
|---|---|
| Domains | Time_Series, Forecasting, Deep_Learning |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
TslibDataModule is an experimental Lightning DataModule that bridges the v2 TimeSeries dataset to tslib-style deep learning models through sliding-window batching.
Description
TslibDataModule extends LightningDataModule and serves as the D2 (data processing) layer for tslib-derived models such as Informer, AutoFormer, and TimeXer. It takes a TimeSeries dataset (D1 layer) and produces train, validation, test, and prediction DataLoaders with sliding-window batches. The module computes metadata describing feature names, indices, types (categorical/continuous), known/unknown status, and forecast horizons. An internal _TslibDataset class handles individual window retrieval, splitting features into history/future and continuous/categorical components, applying known-feature masking for future windows, and producing (x, y) tuples for the model. The module supports configurable context/prediction lengths, window stride, target normalization, and train/val/test splitting.
Usage
Use TslibDataModule when training v2 pytorch-forecasting models (those extending TslibBaseModel) with the new TimeSeries data pipeline. Pass a TimeSeries dataset instance along with context_length and prediction_length to create a fully configured data module.
Code Reference
Source Location
- Repository: Sktime_Pytorch_forecasting
- File: pytorch_forecasting/data/_tslib_data_module.py
- Lines: 1-892
Signature: _TslibDataset
class _TslibDataset(Dataset):
def __init__(
self,
dataset: TimeSeries,
data_module: "TslibDataModule",
windows: list[tuple[int, int, int, int]],
add_relative_time_idx: bool = False,
):
Signature: TslibDataModule
class TslibDataModule(LightningDataModule):
def __init__(
self,
time_series_dataset: TimeSeries,
context_length: int,
prediction_length: int,
freq: str = "h",
add_relative_time_idx: bool = False,
add_target_scales: bool = False,
target_normalizer: NORMALIZER
| str
| list[NORMALIZER]
| tuple[NORMALIZER]
| None = "auto",
scalers: dict[
str, StandardScaler | RobustScaler | TorchNormalizer | EncoderNormalizer
]
| None = None,
shuffle: bool = True,
window_stride: int = 1,
batch_size: int = 32,
num_workers: int = 0,
train_val_test_split: tuple[float, float, float] = (0.7, 0.15, 0.15),
collate_fn: Callable | None = None,
**kwargs,
) -> None:
setup
def setup(self, stage: str | None = None) -> None:
Import
from pytorch_forecasting.data._tslib_data_module import TslibDataModule
I/O Contract
Constructor Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| time_series_dataset | TimeSeries | Yes | The v2 TimeSeries dataset (D1 layer) |
| context_length | int | Yes | Number of historical time steps used as model input |
| prediction_length | int | Yes | Number of future time steps to predict |
| freq | str | No | Frequency of the time series data (default 'h') |
| add_relative_time_idx | bool | No | Whether to add relative time indices (default False) |
| add_target_scales | bool | No | Whether to add target scaling info (default False) |
| target_normalizer | NORMALIZER or str or None | No | Target normalizer; 'auto' uses RobustScaler (default 'auto') |
| scalers | dict or None | No | Dictionary of feature scalers (default None) |
| shuffle | bool | No | Whether to shuffle training data (default True) |
| window_stride | int | No | Stride for the sliding window (default 1) |
| batch_size | int | No | Batch size for dataloaders (default 32) |
| num_workers | int | No | Number of dataloader workers (default 0) |
| train_val_test_split | tuple[float, float, float] | No | Proportions for train/val/test splits (default (0.7, 0.15, 0.15)) |
| collate_fn | Callable or None | No | Custom collate function for the dataloader |
Batch Output (x dict)
| Name | Type | Description |
|---|---|---|
| history_cont | torch.Tensor | Continuous features for the encoder, shape (batch, context_length, n_cont) |
| history_cat | torch.Tensor | Categorical features for the encoder, shape (batch, context_length, n_cat) |
| future_cont | torch.Tensor | Known continuous features for decoder, shape (batch, prediction_length, n_known_cont) |
| future_cat | torch.Tensor | Known categorical features for decoder, shape (batch, prediction_length, n_known_cat) |
| history_target | torch.Tensor | Historical target values, shape (batch, context_length, n_targets) |
| future_target | torch.Tensor | Future target values, shape (batch, prediction_length, n_targets) |
| history_mask | torch.Tensor | Boolean mask for valid encoder time points |
| future_mask | torch.Tensor | Boolean mask for valid decoder time points |
| groups | torch.Tensor | Group identifiers |
| history_time_idx | torch.Tensor | Time indices for encoder |
| future_time_idx | torch.Tensor | Time indices for decoder |
metadata Property Output
| Name | Type | Description |
|---|---|---|
| feature_names | dict[str, list[str]] | Feature names grouped by type (categorical, continuous, static, known, unknown, target, all) |
| feature_indices | dict[str, list[int]] | Feature indices grouped by type |
| n_features | dict[str, int] | Feature counts by type |
| context_length | int | Context window length |
| prediction_length | int | Prediction horizon length |
| freq | str | Time series frequency |
| features | str | Feature mode (S, MS, or M) |
Usage Examples
from pytorch_forecasting.data._tslib_data_module import TslibDataModule
from pytorch_forecasting.data.timeseries import TimeSeries
# Create TimeSeries dataset (D1 layer)
ts = TimeSeries(data=df, time="time_idx", target="value", group=["series_id"])
# Create TslibDataModule (D2 layer)
dm = TslibDataModule(
time_series_dataset=ts,
context_length=96,
prediction_length=24,
batch_size=64,
num_workers=4,
train_val_test_split=(0.7, 0.15, 0.15),
)
dm.setup(stage="fit")
train_loader = dm.train_dataloader()
val_loader = dm.val_dataloader()
# Access metadata for model initialization
metadata = dm.metadata