Principle:Sktime Pytorch forecasting Reversible Instance Normalization

Knowledge Sources	pytorch-forecasting RevIN
Domains	Time_Series, Forecasting, Deep_Learning, Normalization
Last Updated	2026-02-08 09:00 GMT

Overview

Reversible Instance Normalization (RevIN) normalizes input time series instances to remove non-stationary distribution shifts before model processing, then denormalizes model outputs to restore the original data distribution, enabling deep forecasting models to handle non-stationary time series effectively.

Description

RevIN addresses a fundamental challenge in time series forecasting: distribution shift between training and inference, and between the look-back window (encoder input) and the forecast horizon (decoder output). Real-world time series are frequently non-stationary, with shifting means and variances that degrade model performance.

The mechanism operates in two symmetric phases controlled by a mode parameter:

Normalization (mode="norm"): Statistics (mean and standard deviation) are computed per instance across the time dimension (all dimensions except batch and feature). Each instance is centered by subtracting its mean and scaled by dividing by its standard deviation. If optional learnable affine parameters are enabled, the normalized output is further scaled and shifted by per-feature weights and biases. The computed statistics are cached for later denormalization.

Denormalization (mode="denorm"): The inverse transformation is applied. If affine parameters were used, their effect is removed first (subtracting the affine bias and dividing by the affine weight). Then the original scale is restored by multiplying by the cached standard deviation, and the original level is restored by adding the cached mean.

An alternative subtract_last mode replaces the mean with the last observed value in the sequence, which can be more appropriate for series with strong recent trends.

The affine parameters ( $γ$ and $β$ ) are learnable per-feature scalars initialized to ones and zeros respectively, allowing the model to adaptively control how much of the normalization effect to retain.

Usage

Use RevIN as a wrapper layer at the input and output boundaries of any time series forecasting model that struggles with non-stationary data. Call with mode="norm" on the encoder input to standardize the data, then call with mode="denorm" on the decoder output to map predictions back to the original scale. Set affine=True (default) to allow the model to learn the optimal normalization strength. Use subtract_last=True for series where the most recent value is a better centering point than the mean (e.g., financial data with strong momentum).

Theoretical Basis

Instance Normalization:

For an input $x \in ℝ^{B \times T \times C}$ , statistics are computed per instance:

$μ = \frac{1}{T} \sum_{t = 1}^{T} x_{t}, σ = \sqrt{\frac{1}{T} \sum_{t = 1}^{T} (x_{t} - μ)^{2} + ϵ}$

Normalization step:

$\hat{x} = \frac{x - μ}{σ}$

With learnable affine parameters:

$\tilde{x} = γ ⊙ \hat{x} + β$

Where $γ, β \in ℝ^{C}$ are per-feature learnable parameters, initialized to $γ = 𝟏$ and $β = 𝟎$ .

Denormalization step (inverse):

$\hat{x} = \frac{\tilde{x} - β}{γ}$

$x = \hat{x} \cdot σ + μ$

Subtract-last variant:

${\hat{x}}_{t} = \frac{x_{t} - x_{T}}{σ}$

Where $x_{T}$ is the last observed value instead of the mean.

Pseudo-code:

# RevIN forward pass (pseudo-code)
def revin(x, mode):
    if mode == "norm":
        mean = x.mean(dim=time_dims, keepdim=True)
        stdev = sqrt(x.var(dim=time_dims, keepdim=True) + eps)
        x = (x - mean) / stdev
        if affine:
            x = x * gamma + beta
        cache(mean, stdev)   # save for denormalization
    elif mode == "denorm":
        if affine:
            x = (x - beta) / gamma
        x = x * cached_stdev + cached_mean
    return x

Related Pages

Implemented By

Implementation:Sktime_Pytorch_forecasting_RevIN

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment