Heuristic:Sktime Pytorch forecasting Target Normalization Transforms
| Knowledge Sources | |
|---|---|
| Domains | Data_Preprocessing, Time_Series, Optimization |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
Choose target normalization transform based on data properties: softplus for non-negative data with zeros, log1p for strictly positive skewed data, and GroupNormalizer for short encoder lengths (<=20 steps).
Description
pytorch-forecasting provides five target transformations (log, log1p, count, softplus, relu) and two normalizer types (EncoderNormalizer for per-sample normalization, GroupNormalizer for per-group). The auto-selection logic uses data skewness (threshold 2.5 for log vs relu) and encoder length (threshold 20 for EncoderNormalizer vs GroupNormalizer). Using the wrong transform can cause numerical instability: log on data with zeros produces -inf, and normalizer scales below 1e-7 trigger warnings about potential NaN losses. The NegativeBinomialDistributionLoss has strict compatibility requirements: no centering and no logit/log transforms.
Usage
Apply this heuristic when constructing a TimeSeriesDataSet or manually specifying the `target_normalizer` parameter. If your target data contains zeros (e.g., sales volume), use softplus with `center=False`. If your target is strictly positive and highly skewed, use log. If your encoder length is short (<=20 steps), prefer GroupNormalizer over EncoderNormalizer.
The Insight (Rule of Thumb)
- Action 1: For non-negative targets with zeros → use `GroupNormalizer(transformation="softplus", center=False)`.
- Action 2: For strictly positive, highly skewed targets (skew > 2.5) → use `transformation="log"`.
- Action 3: For strictly positive, moderately skewed targets → use `transformation="relu"`.
- Action 4: For short encoder lengths (<=20 steps) → use GroupNormalizer (not EncoderNormalizer).
- Action 5: For NegativeBinomialDistributionLoss → never use `center=True` or `transformation="logit"/"log"`.
- Trade-off: Softplus is smoother but slower than relu. Log provides better compression for heavy-tailed data but fails on zeros. GroupNormalizer is less adaptive than EncoderNormalizer but more stable for short sequences.
Reasoning
The auto-selection logic in TimeSeriesDataSet uses two thresholds:
- Skew > 2.5 → log transform (right-skewed data benefits from compression)
- Encoder length > 20 → EncoderNormalizer (enough data points for reliable per-sample statistics)
When normalizer scales drop below 1e-7 (near-constant series), the code emits warnings because division by near-zero causes numerical instability. The stallion example demonstrates the recommended pattern for sales data: `GroupNormalizer(transformation="softplus", center=False)`.
Auto-selection logic from `data/timeseries/_timeseries.py:962-972`:
if data_properties["target_positive"][target]:
if data_properties["target_skew"][target] > 2.5:
transformer = "log"
else:
transformer = "relu"
else:
transformer = None
if self.max_encoder_length > 20 and self.min_encoder_length > 1:
normalizers.append(EncoderNormalizer(transformation=transformer))
else:
normalizers.append(GroupNormalizer(transformation=transformer))
Scale warning from `data/encoders.py:673-678`:
if (np.asarray(self.scale_) < 1e-7).any():
warnings.warn(
"scale is below 1e-7 - consider not centering "
"the data or using data with higher variance for numerical stability",
UserWarning,
)
NegBinomial assertion from `metrics/distributions.py:186-188`:
assert not encoder.center, "NegativeBinomialDistributionLoss is not compatible with `center=True`"
assert encoder.transformation not in ["logit", "log"], "Cannot use bound transformation"
Stallion example from `examples/stallion.py:87-89`:
target_normalizer=GroupNormalizer(
groups=["agency", "sku"], transformation="softplus", center=False
),
Related Pages
- Implementation:Sktime_Pytorch_forecasting_GroupNormalizer
- Implementation:Sktime_Pytorch_forecasting_NaNLabelEncoder
- Implementation:Sktime_Pytorch_forecasting_TimeSeriesDataSet_Init
- Implementation:Sktime_Pytorch_forecasting_NormalDistributionLoss
- Principle:Sktime_Pytorch_forecasting_Group_Normalization
- Principle:Sktime_Pytorch_forecasting_Distribution_Loss
- Principle:Sktime_Pytorch_forecasting_Time_Series_Dataset_Construction