Heuristic:Sktime Pytorch forecasting Target Normalization Transforms

Knowledge Sources	pytorch-forecasting FAQ
Domains	Data_Preprocessing, Time_Series, Optimization
Last Updated	2026-02-08 08:00 GMT

Overview

Choose target normalization transform based on data properties: softplus for non-negative data with zeros, log1p for strictly positive skewed data, and GroupNormalizer for short encoder lengths (<=20 steps).

Description

pytorch-forecasting provides five target transformations (log, log1p, count, softplus, relu) and two normalizer types (EncoderNormalizer for per-sample normalization, GroupNormalizer for per-group). The auto-selection logic uses data skewness (threshold 2.5 for log vs relu) and encoder length (threshold 20 for EncoderNormalizer vs GroupNormalizer). Using the wrong transform can cause numerical instability: log on data with zeros produces -inf, and normalizer scales below 1e-7 trigger warnings about potential NaN losses. The NegativeBinomialDistributionLoss has strict compatibility requirements: no centering and no logit/log transforms.

Usage

Apply this heuristic when constructing a TimeSeriesDataSet or manually specifying the `target_normalizer` parameter. If your target data contains zeros (e.g., sales volume), use softplus with `center=False`. If your target is strictly positive and highly skewed, use log. If your encoder length is short (<=20 steps), prefer GroupNormalizer over EncoderNormalizer.

The Insight (Rule of Thumb)

Action 1: For non-negative targets with zeros → use `GroupNormalizer(transformation="softplus", center=False)`.
Action 2: For strictly positive, highly skewed targets (skew > 2.5) → use `transformation="log"`.
Action 3: For strictly positive, moderately skewed targets → use `transformation="relu"`.
Action 4: For short encoder lengths (<=20 steps) → use GroupNormalizer (not EncoderNormalizer).
Action 5: For NegativeBinomialDistributionLoss → never use `center=True` or `transformation="logit"/"log"`.
Trade-off: Softplus is smoother but slower than relu. Log provides better compression for heavy-tailed data but fails on zeros. GroupNormalizer is less adaptive than EncoderNormalizer but more stable for short sequences.

Reasoning

The auto-selection logic in TimeSeriesDataSet uses two thresholds:

Skew > 2.5 → log transform (right-skewed data benefits from compression)
Encoder length > 20 → EncoderNormalizer (enough data points for reliable per-sample statistics)

When normalizer scales drop below 1e-7 (near-constant series), the code emits warnings because division by near-zero causes numerical instability. The stallion example demonstrates the recommended pattern for sales data: `GroupNormalizer(transformation="softplus", center=False)`.

Auto-selection logic from `data/timeseries/_timeseries.py:962-972`:

if data_properties["target_positive"][target]:
    if data_properties["target_skew"][target] > 2.5:
        transformer = "log"
    else:
        transformer = "relu"
else:
    transformer = None
if self.max_encoder_length > 20 and self.min_encoder_length > 1:
    normalizers.append(EncoderNormalizer(transformation=transformer))
else:
    normalizers.append(GroupNormalizer(transformation=transformer))

Scale warning from `data/encoders.py:673-678`:

if (np.asarray(self.scale_) < 1e-7).any():
    warnings.warn(
        "scale is below 1e-7 - consider not centering "
        "the data or using data with higher variance for numerical stability",
        UserWarning,
    )

NegBinomial assertion from `metrics/distributions.py:186-188`:

assert not encoder.center, "NegativeBinomialDistributionLoss is not compatible with `center=True`"
assert encoder.transformation not in ["logit", "log"], "Cannot use bound transformation"

Stallion example from `examples/stallion.py:87-89`:

target_normalizer=GroupNormalizer(
    groups=["agency", "sku"], transformation="softplus", center=False
),

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment