Implementation:Sktime Pytorch forecasting TimeSeriesDataSet Init
| Knowledge Sources | |
|---|---|
| Domains | Time_Series, Data_Engineering, Deep_Learning |
| Last Updated | 2026-02-08 07:00 GMT |
Overview
Concrete tool for constructing encoder-decoder time series datasets from pandas DataFrames provided by the pytorch-forecasting library.
Description
The TimeSeriesDataSet class is the central data abstraction in pytorch-forecasting (v1). Its constructor takes a pandas DataFrame and a comprehensive set of parameters specifying variable roles (target, group IDs, static/time-varying categoricals and reals), encoder/decoder window sizes, normalization strategy, and encoding configuration. It validates the data, fits normalizers and encoders (if not pre-fitted), computes valid sample indices, and stores all metadata needed for model architecture inference via from_dataset factory methods.
Usage
Import and instantiate this class whenever you need to prepare data for any pytorch-forecasting model. This is used in every workflow: TFT Demand Forecasting, DeepAR Probabilistic Forecasting, N-BEATS Univariate Forecasting, and TFT Hyperparameter Optimization. The training dataset instance is also used as a template for creating validation datasets via from_dataset.
Code Reference
Source Location
- Repository: pytorch-forecasting
- File: pytorch_forecasting/data/timeseries/_timeseries.py
- Lines: L439-476 (signature), class body extends to ~L2686
Signature
class TimeSeriesDataSet(Dataset):
def __init__(
self,
data: pd.DataFrame,
time_idx: str,
target: str | list[str],
group_ids: list[str],
weight: str | None = None,
max_encoder_length: int = 30,
min_encoder_length: int = None,
min_prediction_idx: int = None,
min_prediction_length: int = None,
max_prediction_length: int = 1,
static_categoricals: list[str] | None = None,
static_reals: list[str] | None = None,
time_varying_known_categoricals: list[str] | None = None,
time_varying_known_reals: list[str] | None = None,
time_varying_unknown_categoricals: list[str] | None = None,
time_varying_unknown_reals: list[str] | None = None,
variable_groups: dict[str, list[int]] | None = None,
constant_fill_strategy: dict[str, str | float | int | bool] | None = None,
allow_missing_timesteps: bool = False,
lags: dict[str, list[int]] | None = None,
add_relative_time_idx: bool = False,
add_target_scales: bool = False,
add_encoder_length: bool | str = "auto",
target_normalizer: NORMALIZER | str | list | tuple | None = "auto",
categorical_encoders: dict[str, NaNLabelEncoder] | None = None,
scalers: dict[str, StandardScaler | RobustScaler | TorchNormalizer | EncoderNormalizer] | None = None,
randomize_length: None | tuple[float, float] | bool = False,
predict_mode: bool = False,
):
Import
from pytorch_forecasting import TimeSeriesDataSet
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | pd.DataFrame | Yes | Input time series dataframe with all variables |
| time_idx | str | Yes | Column name for integer time index |
| target | str or list[str] | Yes | Forecasting target column(s) |
| group_ids | list[str] | Yes | Columns identifying individual time series |
| max_encoder_length | int | No | Maximum lookback window (default: 30) |
| max_prediction_length | int | No | Maximum forecast horizon (default: 1) |
| static_categoricals | list[str] | No | Static categorical feature columns |
| static_reals | list[str] | No | Static continuous feature columns |
| time_varying_known_categoricals | list[str] | No | Known future categorical features |
| time_varying_known_reals | list[str] | No | Known future continuous features |
| time_varying_unknown_reals | list[str] | No | Unknown future continuous features |
| target_normalizer | NORMALIZER or str | No | Normalizer for targets (default: "auto") |
| categorical_encoders | dict | No | Pre-fitted categorical encoders |
Outputs
| Name | Type | Description |
|---|---|---|
| TimeSeriesDataSet | Dataset | PyTorch Dataset yielding (x_dict, (y, weight)) tuples for training |
Usage Examples
TFT Demand Forecasting Dataset
from pytorch_forecasting import TimeSeriesDataSet, GroupNormalizer
from pytorch_forecasting.data.examples import get_stallion_data
data = get_stallion_data()
# Feature engineering
data["time_idx"] = data["date"].dt.year * 12 + data["date"].dt.month
data["time_idx"] -= data["time_idx"].min()
max_prediction_length = 6
max_encoder_length = 24
training_cutoff = data["time_idx"].max() - max_prediction_length
training = TimeSeriesDataSet(
data[lambda x: x.time_idx <= training_cutoff],
time_idx="time_idx",
target="volume",
group_ids=["agency", "sku"],
max_encoder_length=max_encoder_length,
max_prediction_length=max_prediction_length,
static_categoricals=["agency", "sku"],
time_varying_known_reals=["time_idx", "price_regular", "discount_in_percent"],
time_varying_unknown_reals=["volume", "log_volume", "industry_volume", "soda_volume"],
target_normalizer=GroupNormalizer(groups=["agency", "sku"], transformation="softplus"),
add_relative_time_idx=True,
add_target_scales=True,
add_encoder_length=True,
)
DeepAR Univariate Dataset
from pytorch_forecasting import TimeSeriesDataSet, GroupNormalizer
from pytorch_forecasting.data.encoders import NaNLabelEncoder
from pytorch_forecasting.data.examples import generate_ar_data
data = generate_ar_data(seasonality=10.0, timesteps=400, n_series=100)
training_cutoff = data["time_idx"].max() - 20
training = TimeSeriesDataSet(
data[lambda x: x.time_idx <= training_cutoff],
time_idx="time_idx",
target="value",
group_ids=["series"],
categorical_encoders={"series": NaNLabelEncoder().fit(data.series)},
max_encoder_length=60,
max_prediction_length=20,
time_varying_unknown_reals=["value"],
time_varying_known_reals=["time_idx"],
target_normalizer=GroupNormalizer(groups=["series"]),
add_target_scales=True,
)