Principle:Sktime Pytorch forecasting Point Loss Functions

Knowledge Sources	pytorch-forecasting
Domains	Time_Series, Forecasting, Deep_Learning, Loss_Functions
Last Updated	2026-02-08 09:00 GMT

Overview

A collection of point forecast loss functions that extend the MultiHorizonMetric base class, providing per-horizon evaluation for multi-step time series forecasting. The suite includes PoissonLoss, SMAPE, MAPE, MAE, RMSE, CrossEntropy, TweedieLoss, and MASE.

Description

Point loss functions in pytorch-forecasting are designed for models that produce a single predicted value per time step (as opposed to distributional or quantile outputs). All point metrics inherit from MultiHorizonMetric, which provides the machinery for handling variable-length sequences (via PackedSequence or masking), optional sample weighting, configurable reduction strategies (mean, sum, sqrt-mean), and the standard to_prediction/to_quantiles interface for converting raw network output to usable forecasts.

Each metric implements a loss method that computes element-wise errors between predictions and targets. The key metrics are:

PoissonLoss: Applies Poisson negative log-likelihood loss, suitable for non-negative count data. The model outputs log-rate values, and the to_prediction method exponentiates them to obtain the predicted rate. Quantile conversion uses the inverse CDF of the Poisson distribution via SciPy.

SMAPE (Symmetric Mean Absolute Percentage Error): Computes the symmetric percentage error defined as $2 | y - \hat{y} | / (| y | + | \hat{y} | + ϵ)$ . Appropriate when relative errors should be bounded between 0 and 2 regardless of scale.

MAPE (Mean Absolute Percentage Error): Computes the standard percentage error $| y - \hat{y} | / (| y | + ϵ)$ . More traditional than SMAPE but can be unbounded when actuals are near zero.

MAE (Mean Absolute Error): The simplest point loss, computing $| y - \hat{y} |$ . Scale-dependent and robust to outliers compared to squared-error losses.

RMSE (Root Mean Square Error): Computes squared errors in the loss method and applies the square root during reduction via the sqrt-mean reduction strategy. More sensitive to outliers than MAE.

CrossEntropy: Classification loss for categorical time series targets. The to_prediction method returns the argmax class, and to_quantiles returns raw class probabilities.

TweedieLoss: Implements Tweedie deviance loss with a log-link function, parameterized by a power parameter p in [1, 2). When p approaches 1, the loss approximates Poisson; when p approaches 2, it approximates Gamma. Useful for modeling insurance claims, sales quantities, and other zero-inflated continuous targets.

MASE (Mean Absolute Scaled Error): A scale-free metric that normalizes absolute errors by the mean absolute difference of the historical (encoder) target sequence. It requires encoder target values and lengths to compute the scaling factor. A MASE of 1.0 means the model performs equivalently to a naive random walk forecast.

Usage

Use point loss functions when the model should output a single best-guess forecast per time step. Choose MAE or RMSE for general regression tasks. Choose PoissonLoss or TweedieLoss for count or zero-inflated data. Use SMAPE or MAPE when scale-independent percentage errors are preferred. Use CrossEntropy for discrete categorical forecasting. Use MASE for evaluation when comparing models across different scales, noting that it requires additional encoder target information at update time.

Theoretical Basis

MAE: $MAE = \frac{1}{T} \sum_{t = 1}^{T} | y_{t} - {\hat{y}}_{t} |$

RMSE: $RMSE = \sqrt{\frac{1}{T} \sum_{t = 1}^{T} (y_{t} - {\hat{y}}_{t})^{2}}$

SMAPE: $SMAPE = \frac{2}{T} \sum_{t = 1}^{T} \frac{| y_{t} - {\hat{y}}_{t} |}{| y_{t} | + | {\hat{y}}_{t} | + ϵ}$

MAPE: $MAPE = \frac{1}{T} \sum_{t = 1}^{T} \frac{| y_{t} - {\hat{y}}_{t} |}{| y_{t} | + ϵ}$

PoissonLoss: $ℒ_{Poisson} = {\hat{y}}_{t} - y_{t} \cdot \log ({\hat{y}}_{t} + ϵ)$

where the model outputs $\log ({\hat{y}}_{t})$ and the loss operates on the log-space prediction.

TweedieLoss with power p: $ℒ_{Tweedie} = - \frac{y \cdot e^{\hat{y} (1 - p)}}{1 - p} + \frac{e^{\hat{y} (2 - p)}}{2 - p}$

where $\hat{y}$ is the log-space network output and $p \in [1, 2)$ .

MASE: $MASE = \frac{| y_{t} - {\hat{y}}_{t} |}{\frac{1}{N - 1} \sum_{i = 2}^{N} | z_{i} - z_{i - 1} |}$

where $z$ denotes the concatenated encoder and decoder target values and $N$ is the total length.

Pseudo-code for multi-horizon evaluation:

# Generic MultiHorizonMetric pattern
def update(y_pred, target):
    if target is PackedSequence:
        target, lengths = unpack(target)
    losses = loss(y_pred, target)        # Element-wise loss
    if weight is not None:
        losses = losses * weight
    accumulate(losses, lengths)          # Track per-sample losses

def compute():
    return reduce(accumulated_losses)    # Apply reduction strategy

Related Pages

Implemented By

Implementation:Sktime_Pytorch_forecasting_Point_Metrics

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment