Workflow:Sktime Pytorch forecasting DeepAR Probabilistic Forecasting

Knowledge Sources	pytorch-forecasting PyTorch Forecasting Docs DeepAR
Domains	Time_Series, Probabilistic_Forecasting, Deep_Learning
Last Updated	2026-02-08 07:00 GMT

Overview

End-to-end process for probabilistic time series forecasting using the DeepAR autoregressive recurrent network with distributional output.

Description

This workflow covers training a DeepAR model to produce full predictive distributions (not just point forecasts) for multiple time series. DeepAR uses an autoregressive LSTM-based architecture that emits distribution parameters (e.g., mean and variance for a Normal distribution) at each time step, enabling sampling-based probabilistic forecasts. The process generates synthetic autoregressive data, wraps it in a TimeSeriesDataSet with per-group normalization, constructs the DeepAR model with a distributional loss function, and trains it with GPU acceleration. After training, the model produces both point predictions and distributional forecasts.

Key capabilities:

Emits full predictive distributions, not just point estimates
Supports multiple distribution families (Normal, LogNormal, NegativeBinomial, Beta)
Handles multiple time series with shared learned dynamics
Autoregressive inference generates realistic forecast trajectories

Usage

Execute this workflow when you need forecast uncertainty quantification (prediction intervals, distributional forecasts) for a collection of related time series with minimal covariate information. This is well-suited for inventory planning, capacity forecasting, and any scenario where knowing the range of likely outcomes is as important as the central forecast.

Execution Steps

Step 1: Data Generation or Loading

Prepare the time series data as a pandas DataFrame with columns for series identifier, integer time index, and target value. For synthetic experimentation, use the built-in generate_ar_data helper to create multivariate autoregressive data with configurable seasonality, trend, and noise. For real data, ensure the DataFrame follows the same columnar structure.

Key considerations:

Each series needs a unique identifier column (e.g., series ID as integer or string)
The time index must be a monotonically increasing integer
For DeepAR, the target is typically a single univariate column per series
Split series into training and validation sets (by series ID or by time cutoff)

Step 2: TimeSeriesDataSet Construction

Create a TimeSeriesDataSet with the target, group identifiers, and time index. Configure encoder and prediction lengths to fixed values (DeepAR expects consistent window sizes). Apply GroupNormalizer to normalize each series independently, and use NaNLabelEncoder for series identifiers.

Key considerations:

Set min_encoder_length equal to max_encoder_length for fixed-length windows
Similarly fix min_prediction_length equal to max_prediction_length
GroupNormalizer normalizes each time series by its own statistics
NaNLabelEncoder handles series identifiers that may contain unseen values at inference
Set add_relative_time_idx=False (DeepAR does not use relative positional encoding)
Set add_target_scales=True to provide the model with series-level scale information

Step 3: Validation Dataset and DataLoader Creation

Create a validation TimeSeriesDataSet using from_dataset to inherit preprocessing parameters. Split validation data by series ID (not just time cutoff) to test generalization to unseen series. Convert both datasets to DataLoaders.

Key considerations:

Splitting by series ID tests the model on entirely unseen time series
Set stop_randomization=True for reproducible validation evaluation
Keep num_workers=0 if debugging; increase for production training

Step 4: Trainer Configuration

Set up the PyTorch Lightning Trainer with GPU acceleration, early stopping, gradient clipping, and learning rate monitoring. Configure epoch limits and batch limits as needed.

Key considerations:

DeepAR benefits from GPU acceleration for LSTM computations
Gradient clipping (0.1) prevents exploding gradients in recurrent networks
limit_train_batches and limit_val_batches speed up development iterations

Step 5: Model Instantiation

Create the DeepAR model using from_dataset, specifying the distributional loss function (e.g., NormalDistributionLoss for Gaussian output), hidden size for the LSTM layers, and dropout rate. The model architecture is inferred from the dataset metadata.

Key considerations:

NormalDistributionLoss produces mean and variance parameters; choose the distribution family matching your data characteristics
LogNormalDistributionLoss or NegativeBinomialDistributionLoss for count or strictly positive data
hidden_size controls LSTM capacity (32-128 typical)
log_interval and log_val_interval control TensorBoard logging frequency

Step 6: Model Training

Train the model using trainer.fit() with the training and validation dataloaders. The DeepAR model learns distribution parameters autoregressively during teacher-forced training.

Key considerations:

Monitor val_loss convergence in TensorBoard
Early stopping prevents overfitting to the training series
Teacher forcing is used during training; autoregressive sampling during inference

Step 7: Prediction and Evaluation

Run model.predict() on the validation DataLoader to obtain point predictions. Compare against actual values to compute error metrics. Optionally use mode="raw" to obtain the full distributional parameters for uncertainty visualization.

Key considerations:

Default predict mode returns the distribution mean as point prediction
mode="raw" returns distribution parameters for sampling or quantile extraction
return_x=True enables plotting predictions alongside input context
plot_prediction() provides built-in visualization of forecasts with uncertainty bands

Execution Diagram

GitHub URL

Workflow Repository