Principle:Sktime Pytorch forecasting MQF2 Distribution Forecasting

Knowledge Sources	Multivariate Quantile Function Forecaster Convex Potential Flows: Universal Probability Distributions with Optimal Transport and Convex Optimization pytorch-forecasting
Domains	Time_Series, Forecasting, Deep_Learning, Probabilistic_Forecasting, Normalizing_Flows
Last Updated	2026-02-08 09:00 GMT

Overview

Multivariate Quantile Function Forecaster (MQF2) uses normalizing flows built from deep convex networks (partially input convex neural networks) to learn a multivariate quantile function that maps uniform random variables to calibrated probabilistic forecasts over the full prediction horizon jointly.

Description

MQF2 approaches probabilistic forecasting by learning a multivariate quantile function -- a mapping from standard normal samples to forecast distributions conditioned on an RNN-derived hidden state. Unlike marginal quantile methods that produce independent quantile predictions per time step, MQF2 captures the full joint distribution across the prediction horizon, preserving temporal correlations in the forecast uncertainty.

Core components:

1. DeepConvexNet: Wraps a partially input convex neural network (PICNN) and provides forward transform and log-determinant computation for normalizing flows. The PICNN is convex with respect to a subset of its inputs (the sample variables), while being unconstrained with respect to the context (hidden state). When used as a normalizing flow (maximum likelihood training), strict convexity is enforced by adding a weighted quadratic term $\frac{1}{2} w_{0} ‖ x ‖^{2}$ to the PICNN output. When used with energy score training, the raw PICNN output suffices because strict convexity is not required.

Log-determinant computation supports two modes: (a) brute-force exact computation of the Jacobian log-determinant, and (b) stochastic estimation using the Lanczos tridiagonalization and conjugate gradient algorithms for scalability.

2. SequentialNet: Chains multiple DeepConvexNet layers (and optional ActNorm layers) into a sequential normalizing flow. Provides the forward pass that transforms standard normal samples through the flow, and the es_sample function for drawing conditional samples.

3. MQF2Distribution: A PyTorch Distribution subclass that encapsulates the full probabilistic forecasting model. It supports two training objectives:

Energy score: A proper scoring rule for multivariate distributions. It does not require computing the normalizing flow's log-determinant, so it is simpler and more robust. Samples are drawn by passing standard normal noise through the convex flow conditioned on the hidden state.
Maximum likelihood (normalizing flows): Computes the exact log-probability using the change-of-variables formula. This requires the log-determinant of the Jacobian of the convex flow.

The distribution provides rsample (reparameterized sampling) and quantile functions for forecast generation.

4. TransformedMQF2Distribution: Wraps the base MQF2 distribution with affine transforms for proper handling of target scaling (location-scale normalization applied by the data pipeline).

Usage

Use MQF2 when joint multivariate probabilistic forecasts are needed -- i.e., when it is important that the predicted quantiles across different forecast horizons are coherent and preserve temporal correlation structure. It is appropriate for: (1) applications requiring full predictive distributions (e.g., risk assessment, inventory optimization), (2) scenarios where temporal correlations in forecast uncertainty matter (e.g., cumulative demand over a window), and (3) when the energy score objective is preferred for its robustness and simplicity over maximum likelihood.

Theoretical Basis

Quantile function via convex flows:

Given a hidden state $h$ from an RNN encoder and a standard normal sample $α \sim 𝒩 (0, I)$ , the quantile function is:

$\hat{y} = g (α; h) = \nabla_{α} Φ (α; h)$

where $Φ$ is a partially input convex neural network (convex in $α$ , arbitrary in $h$ ).

Energy score training objective:

$ES (g, z) = \frac{1}{M} \sum_{j = 1}^{M} ‖ w_{j} - z ‖^{β} - \frac{1}{2 M^{2}} \sum_{j = 1}^{M} \sum_{k = 1}^{M} ‖ w_{j} - w_{k} ‖^{β}$

where $w_{j} = g (α_{j}; h)$ are samples from the learned quantile function, $z$ is the observation, $M$ is the number of Monte Carlo samples, and $β$ is a power hyperparameter (default: 1.0).

Maximum likelihood training objective:

$\log p (z | h) = \log p_{0} (g^{- 1} (z; h)) + \log | \det \frac{\partial g^{- 1}}{\partial z} |$

where $p_{0}$ is the standard normal density and $g^{- 1}$ is the inverse of the convex flow.

Strict convexity guarantee:

$Φ^{*} (x; h) = softplus (w_{1}) \cdot PICNN (x, h) + \frac{softplus (w_{0})}{2} ‖ x ‖^{2}$

The quadratic term ensures the Hessian is positive definite, guaranteeing invertibility for the normalizing flow approach.

Sliding window for training:

During training, observations of shape $(B, T_{c} + H - 1)$ are unfolded into sliding windows of length $H$ , yielding $B \times T_{c}$ training pairs, each aligned with its corresponding hidden state from the RNN encoder.

Key hyperparameters:

is_energy_score -- selects energy score vs. maximum likelihood objective
es_num_samples -- Monte Carlo samples for energy score approximation (default: 50)
beta -- energy score power parameter (default: 1.0)
threshold_input -- input clamping for numerical stability under MLE (default: 100.0)

Related Pages

Implemented By

Implementation:Sktime_Pytorch_forecasting_MQF2_Utils

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment