Principle:Sktime Pytorch forecasting Reversible Instance Normalization
| Knowledge Sources | |
|---|---|
| Domains | Time_Series, Forecasting, Deep_Learning, Normalization |
| Last Updated | 2026-02-08 09:00 GMT |
Overview
Reversible Instance Normalization (RevIN) normalizes input time series instances to remove non-stationary distribution shifts before model processing, then denormalizes model outputs to restore the original data distribution, enabling deep forecasting models to handle non-stationary time series effectively.
Description
RevIN addresses a fundamental challenge in time series forecasting: distribution shift between training and inference, and between the look-back window (encoder input) and the forecast horizon (decoder output). Real-world time series are frequently non-stationary, with shifting means and variances that degrade model performance.
The mechanism operates in two symmetric phases controlled by a mode parameter:
Normalization (mode="norm"): Statistics (mean and standard deviation) are computed per instance across the time dimension (all dimensions except batch and feature). Each instance is centered by subtracting its mean and scaled by dividing by its standard deviation. If optional learnable affine parameters are enabled, the normalized output is further scaled and shifted by per-feature weights and biases. The computed statistics are cached for later denormalization.
Denormalization (mode="denorm"): The inverse transformation is applied. If affine parameters were used, their effect is removed first (subtracting the affine bias and dividing by the affine weight). Then the original scale is restored by multiplying by the cached standard deviation, and the original level is restored by adding the cached mean.
An alternative subtract_last mode replaces the mean with the last observed value in the sequence, which can be more appropriate for series with strong recent trends.
The affine parameters ( and ) are learnable per-feature scalars initialized to ones and zeros respectively, allowing the model to adaptively control how much of the normalization effect to retain.
Usage
Use RevIN as a wrapper layer at the input and output boundaries of any time series forecasting model that struggles with non-stationary data. Call with mode="norm" on the encoder input to standardize the data, then call with mode="denorm" on the decoder output to map predictions back to the original scale. Set affine=True (default) to allow the model to learn the optimal normalization strength. Use subtract_last=True for series where the most recent value is a better centering point than the mean (e.g., financial data with strong momentum).
Theoretical Basis
Instance Normalization:
For an input , statistics are computed per instance:
Normalization step:
With learnable affine parameters:
Where are per-feature learnable parameters, initialized to and .
Denormalization step (inverse):
Subtract-last variant:
Where is the last observed value instead of the mean.
Pseudo-code:
# RevIN forward pass (pseudo-code)
def revin(x, mode):
if mode == "norm":
mean = x.mean(dim=time_dims, keepdim=True)
stdev = sqrt(x.var(dim=time_dims, keepdim=True) + eps)
x = (x - mean) / stdev
if affine:
x = x * gamma + beta
cache(mean, stdev) # save for denormalization
elif mode == "denorm":
if affine:
x = (x - beta) / gamma
x = x * cached_stdev + cached_mean
return x