Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Sktime Pytorch forecasting Distribution Loss

From Leeroopedia
Revision as of 18:13, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Sktime_Pytorch_forecasting_Distribution_Loss.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Time_Series, Loss_Functions, Probabilistic_Forecasting
Last Updated 2026-02-08 07:00 GMT

Overview

Loss function that trains models to predict the parameters of a probability distribution, enabling parametric probabilistic forecasts via negative log-likelihood minimization.

Description

Distribution Loss is a parametric approach to probabilistic forecasting where the model learns to output the parameters of a known distribution family (e.g., Normal, NegativeBinomial, LogNormal). The loss is the negative log-likelihood of the observed target under the predicted distribution. This approach leverages domain knowledge about the data distribution (e.g., count data suits NegativeBinomial, continuous data suits Normal) and provides well-calibrated uncertainty estimates. The NormalDistributionLoss specifically models targets as Gaussian with learned location and scale parameters, applying an affine rescaling transformation to undo target normalization.

Usage

Use NormalDistributionLoss as the default loss for DeepAR when forecasting continuous-valued targets. Choose alternative distribution losses based on data characteristics: NegativeBinomialDistributionLoss for count data, LogNormalDistributionLoss for strictly positive data with heavy right tails, BetaDistributionLoss for bounded [0,1] data.

Theoretical Basis

Negative log-likelihood loss:

(θ)=t=1Tlogpθ(yt|xt)

For Normal distribution:

logp(y|μ,σ)=12log(2πσ2)+(yμ)22σ2

The model outputs raw parameters which are transformed:

  • loc (μ) — used directly
  • scale (σ) — passed through softplus: σ=log(1+ez) to ensure positivity

Rescaling: The predicted distribution is an affine transformation of the base distribution: Y=center+scaleYbase

Where center and scale come from the target normalizer (GroupNormalizer).

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment