Principle:Sktime Pytorch forecasting NHiTS Architecture

Knowledge Sources	N-HiTS pytorch-forecasting
Domains	Time_Series, Forecasting, Deep_Learning, Signal_Processing
Last Updated	2026-02-08 09:00 GMT

Overview

Neural Hierarchical Interpolation for Time Series (N-HiTS) is a multi-rate signal sampling architecture that extends N-BEATS with hierarchical interpolation and input pooling to achieve efficient long-horizon forecasting.

Description

N-HiTS addresses the computational and accuracy challenges of long-horizon time series forecasting by introducing two key innovations on top of the N-BEATS doubly-residual architecture: multi-rate input pooling and hierarchical interpolation. The model is organized into multiple stacks, each containing one or more blocks. Each stack operates at a different temporal resolution by applying pooling (max or average) to the input signal with a stack-specific kernel size. This allows early stacks to capture coarse, low-frequency patterns using aggressive downsampling, while later stacks focus on fine-grained, high-frequency details with minimal pooling.

Within each block, the pooled input is passed through a multi-layer perceptron (MLP) that produces two sets of expansion coefficients: backcast theta (for reconstructing the input) and forecast theta (for generating predictions). The forecast theta has a reduced dimensionality determined by the downsample frequency parameter. The compressed forecast coefficients are then upsampled to the full prediction length via interpolation (linear, nearest, or bicubic), which is the hierarchical interpolation step. This dramatically reduces the number of parameters needed for long horizons because each block only needs to predict a small number of interpolation knots rather than the full forecast vector.

Residual learning proceeds as in N-BEATS: each block subtracts its backcast from the input signal, passing the residual to the next block. An optional naive level (last observed value) is added as a baseline forecast. The final prediction is the sum of all block-level forecasts. N-HiTS supports static covariates, encoder covariates, and decoder covariates through embedding layers and concatenation with the pooled target signal.

Usage

Use N-HiTS for long-horizon univariate or multivariate time series forecasting when: (1) the prediction horizon is large relative to the lookback window, (2) the time series exhibits patterns at multiple temporal scales, (3) computational efficiency matters compared to full-resolution models like N-BEATS. The model requires fixed encoder and decoder lengths (min_encoder_length == max_encoder_length and min_prediction_length == max_prediction_length). Pooling sizes and downsample frequencies are automatically set via heuristics if not provided, scaling exponentially across stacks.

Theoretical Basis

Multi-rate input pooling: Each stack s applies pooling with kernel size $p_{s}$ to the input, reducing the temporal dimension:

${\tilde{x}}_{s} = {Pool}_{p_{s}} (x), \dim ({\tilde{x}}_{s}) = ⌈ L / p_{s} ⌉$

where $L$ is the context length. Pooling sizes are ordered from large (coarse) to small (fine) across stacks.

Block MLP and theta generation: Each block concatenates the pooled target, encoder covariates, decoder covariates, and static features into a single vector, which is processed by an MLP:

$[θ^{b}, θ^{f}] = {MLP}_{l} (concat ({\tilde{x}}_{s}, x^{e n c}, x^{d e c}, x^{s t a t i c}))$

The backcast theta $θ^{b}$ has dimension $L$ (full context length). The forecast theta $θ^{f}$ has reduced dimension $n_{θ} = \max (⌊ H / d_{s} ⌋, 1)$ , where $H$ is the prediction length and $d_{s}$ is the downsample frequency for stack s.

Hierarchical interpolation: The compressed forecast coefficients are interpolated to the full prediction length:

${\hat{y}}_{l} = Interpolate (θ_{l}^{f}, H)$

using linear, nearest-neighbor, or bicubic interpolation.

Doubly-residual stacking: Residuals propagate through blocks via subtraction:

$x_{l + 1} = (x_{l} - {\hat{x}}_{l}) ⊙ m$

where ${\hat{x}}_{l}$ is the backcast and $m$ is the encoder mask. The final forecast is the sum:

$\hat{y} = y_{naive} + \sum_{l = 1}^{B} {\hat{y}}_{l}$

where $y_{naive}$ is the last observed value repeated across the horizon (when naive_level is enabled).

Backcast loss: An optional backcast loss term regularizes the reconstruction quality:

$ℒ = w_{f} \cdot ℒ_{forecast} + w_{b} \cdot ℒ_{backcast}$

where the weights are derived from the backcast_loss_ratio parameter.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment