Workflow:Sktime Pytorch forecasting DeepAR Probabilistic Forecasting
| Knowledge Sources | |
|---|---|
| Domains | Time_Series, Probabilistic_Forecasting, Deep_Learning |
| Last Updated | 2026-02-08 07:00 GMT |
Overview
End-to-end process for probabilistic time series forecasting using the DeepAR autoregressive recurrent network with distributional output.
Description
This workflow covers training a DeepAR model to produce full predictive distributions (not just point forecasts) for multiple time series. DeepAR uses an autoregressive LSTM-based architecture that emits distribution parameters (e.g., mean and variance for a Normal distribution) at each time step, enabling sampling-based probabilistic forecasts. The process generates synthetic autoregressive data, wraps it in a TimeSeriesDataSet with per-group normalization, constructs the DeepAR model with a distributional loss function, and trains it with GPU acceleration. After training, the model produces both point predictions and distributional forecasts.
Key capabilities:
- Emits full predictive distributions, not just point estimates
- Supports multiple distribution families (Normal, LogNormal, NegativeBinomial, Beta)
- Handles multiple time series with shared learned dynamics
- Autoregressive inference generates realistic forecast trajectories
Usage
Execute this workflow when you need forecast uncertainty quantification (prediction intervals, distributional forecasts) for a collection of related time series with minimal covariate information. This is well-suited for inventory planning, capacity forecasting, and any scenario where knowing the range of likely outcomes is as important as the central forecast.
Execution Steps
Step 1: Data Generation or Loading
Prepare the time series data as a pandas DataFrame with columns for series identifier, integer time index, and target value. For synthetic experimentation, use the built-in generate_ar_data helper to create multivariate autoregressive data with configurable seasonality, trend, and noise. For real data, ensure the DataFrame follows the same columnar structure.
Key considerations:
- Each series needs a unique identifier column (e.g., series ID as integer or string)
- The time index must be a monotonically increasing integer
- For DeepAR, the target is typically a single univariate column per series
- Split series into training and validation sets (by series ID or by time cutoff)
Step 2: TimeSeriesDataSet Construction
Create a TimeSeriesDataSet with the target, group identifiers, and time index. Configure encoder and prediction lengths to fixed values (DeepAR expects consistent window sizes). Apply GroupNormalizer to normalize each series independently, and use NaNLabelEncoder for series identifiers.
Key considerations:
- Set min_encoder_length equal to max_encoder_length for fixed-length windows
- Similarly fix min_prediction_length equal to max_prediction_length
- GroupNormalizer normalizes each time series by its own statistics
- NaNLabelEncoder handles series identifiers that may contain unseen values at inference
- Set add_relative_time_idx=False (DeepAR does not use relative positional encoding)
- Set add_target_scales=True to provide the model with series-level scale information
Step 3: Validation Dataset and DataLoader Creation
Create a validation TimeSeriesDataSet using from_dataset to inherit preprocessing parameters. Split validation data by series ID (not just time cutoff) to test generalization to unseen series. Convert both datasets to DataLoaders.
Key considerations:
- Splitting by series ID tests the model on entirely unseen time series
- Set stop_randomization=True for reproducible validation evaluation
- Keep num_workers=0 if debugging; increase for production training
Step 4: Trainer Configuration
Set up the PyTorch Lightning Trainer with GPU acceleration, early stopping, gradient clipping, and learning rate monitoring. Configure epoch limits and batch limits as needed.
Key considerations:
- DeepAR benefits from GPU acceleration for LSTM computations
- Gradient clipping (0.1) prevents exploding gradients in recurrent networks
- limit_train_batches and limit_val_batches speed up development iterations
Step 5: Model Instantiation
Create the DeepAR model using from_dataset, specifying the distributional loss function (e.g., NormalDistributionLoss for Gaussian output), hidden size for the LSTM layers, and dropout rate. The model architecture is inferred from the dataset metadata.
Key considerations:
- NormalDistributionLoss produces mean and variance parameters; choose the distribution family matching your data characteristics
- LogNormalDistributionLoss or NegativeBinomialDistributionLoss for count or strictly positive data
- hidden_size controls LSTM capacity (32-128 typical)
- log_interval and log_val_interval control TensorBoard logging frequency
Step 6: Model Training
Train the model using trainer.fit() with the training and validation dataloaders. The DeepAR model learns distribution parameters autoregressively during teacher-forced training.
Key considerations:
- Monitor val_loss convergence in TensorBoard
- Early stopping prevents overfitting to the training series
- Teacher forcing is used during training; autoregressive sampling during inference
Step 7: Prediction and Evaluation
Run model.predict() on the validation DataLoader to obtain point predictions. Compare against actual values to compute error metrics. Optionally use mode="raw" to obtain the full distributional parameters for uncertainty visualization.
Key considerations:
- Default predict mode returns the distribution mean as point prediction
- mode="raw" returns distribution parameters for sampling or quantile extraction
- return_x=True enables plotting predictions alongside input context
- plot_prediction() provides built-in visualization of forecasts with uncertainty bands