Principle:Sktime Pytorch forecasting Time Series Data Loading
| Knowledge Sources | |
|---|---|
| Domains | Time_Series, Data_Engineering |
| Last Updated | 2026-02-08 07:00 GMT |
Overview
Technique for loading and preparing real-world tabular time series data with covariates for forecasting model consumption.
Description
Time Series Data Loading is the first step in any forecasting pipeline. It involves reading raw data from storage (CSV, Parquet, databases), ensuring the data has the correct schema (time index, group identifiers, target variables, covariates), and performing initial feature engineering such as creating time-based features, log transforms, and rolling aggregates. In the context of demand forecasting, this typically means loading historical sales or volume data alongside static metadata (product IDs, store locations) and dynamic covariates (promotions, holidays, price changes). The quality and structure of loaded data directly determines the success of downstream model training.
Usage
Use this principle at the beginning of any forecasting workflow when working with real-world tabular datasets that contain multiple time series identified by group columns (e.g., agency + SKU combinations). This is the appropriate starting point when the data comes with mixed covariates: static categoricals, time-varying known reals, and time-varying unknown targets. It is not needed when generating synthetic data for experimentation.
Theoretical Basis
Data loading for time series forecasting follows a specific schema requirement:
Required columns:
- Time index — monotonically increasing integer identifying time position
- Group identifiers — one or more columns that uniquely identify each individual series
- Target variable — the value to forecast (e.g., sales volume)
Optional covariates:
- Static categoricals — time-invariant categorical features (e.g., product type)
- Static reals — time-invariant continuous features (e.g., store size)
- Time-varying known — future-known features (e.g., holidays, promotions)
- Time-varying unknown — features known only in the past (e.g., lagged target)
Pseudo-code logic:
# Abstract data loading pipeline
raw_data = load_from_storage(path)
data = add_time_index(raw_data, date_column)
data = add_group_identifiers(data, group_columns)
data = engineer_features(data) # log transforms, rolling stats, etc.
# Result: DataFrame ready for TimeSeriesDataSet construction