Principle:Sktime Pytorch forecasting Dense Encoder Decoder

Knowledge Sources	TiDE pytorch-forecasting
Domains	Time_Series, Forecasting, Deep_Learning, Encoder_Decoder
Last Updated	2026-02-08 09:00 GMT

Overview

TiDE (Time-series Dense Encoder-Decoder) is an MLP-based encoder-decoder architecture for long-term time series forecasting that uses residual blocks throughout, avoiding attention mechanisms while achieving competitive accuracy with linear computational complexity.

Description

TiDE demonstrates that simple MLP-based architectures with careful structural design can match or exceed Transformer-based models for long-term forecasting, while being significantly faster. The architecture consists of five main components: (1) feature projection for covariates, (2) a dense encoder that maps the flattened lookback window and projected covariates into a compact representation, (3) a dense decoder that expands the encoded representation to the forecast horizon, (4) a temporal decoder that produces per-step predictions by combining decoded features with future covariate projections, and (5) a lookback skip connection that provides a direct linear path from past observations to future predictions.

All major components are built from residual blocks, each consisting of a two-layer MLP (Linear-ReLU-Linear-Dropout) with a linear skip connection from input to output. Optional layer normalization can be applied after the residual addition. This residual design enables training of deeper networks by ensuring gradient flow.

The feature projection stage reduces the dimensionality of future covariates through a residual block, projecting from the raw covariate dimension to a lower temporal_width_future dimension. This projection is shared across all time steps.

The encoder flattens and concatenates three inputs: the lookback target values, the projected covariate features (for both lookback and horizon periods), and static covariates. This flattened vector passes through a stack of residual blocks (configurable via num_encoder_layers).

The decoder takes the encoder output through another stack of residual blocks (configurable via num_decoder_layers), with the final block projecting to decoder_output_dim * output_chunk_length dimensions. The output is reshaped to (batch, horizon, decoder_output_dim).

The temporal decoder concatenates the per-step decoder output with the projected future covariates for the forecast period and maps this through a final residual block to produce per-step target predictions.

The lookback skip connection applies a simple linear transformation across the time dimension (transposing from input_chunk_length to output_chunk_length), providing a direct path for autoregressive-like information flow. The skip output is added to the temporal decoder output to produce the final forecast.

Two implementations exist in the repository: (1) the _TideModule used by TiDEModel following the pytorch-forecasting v1 data interface with BaseModelWithCovariates, and (2) a TIDE v2 class using the BaseModel v2 interface with EncoderDecoderDataModule patterns, which adds categorical embedding support and per-variable linear projections for numerical features.

Usage

Use TiDE for long-term time series forecasting when: (1) computational efficiency is a priority compared to Transformer-based models, (2) future covariates and static covariates are available, (3) the forecast horizon is large and attention-based models become prohibitively expensive. The model requires fixed encoder and decoder lengths. Key hyperparameters are hidden_size (typically 32-128 without covariates), num_encoder_layers, num_decoder_layers, and decoder_output_dim.

Theoretical Basis

Residual block:

$ResBlock (x) = LayerNorm (Dense (x) + W_{skip} x)$

where:

$Dense (x) = Dropout (W_{2} \cdot ReLU (W_{1} x + b_{1}) + b_{2})$

and $W_{skip}$ is a linear skip connection that matches input and output dimensions.

Feature projection for future covariates:

${\tilde{x}}_{t}^{f u t} = {ResBlock}_{proj} (x_{t}^{f u t}), t = 1, \dots, L + H$

where $L$ is the lookback length and $H$ is the horizon.

Encoder:

$e = concat (flatten (y_{1 : L}), flatten ({\tilde{x}}_{1 : L + H}^{f u t}), x^{s t a t i c})$

$z = {ResBlock}_{enc}^{(N_{e})} (\dots ({ResBlock}_{enc}^{(1)} (e)))$

Decoder:

$d = {ResBlock}_{dec}^{(N_{d})} (\dots ({ResBlock}_{dec}^{(1)} (z)))$

The decoder output is reshaped to $(B, H, d_{out})$ .

Temporal decoder:

${\hat{y}}_{t}^{t e m p} = {ResBlock}_{temp} (concat (d_{t}, {\tilde{x}}_{L + t}^{f u t})), t = 1, \dots, H$

Lookback skip connection:

${\hat{y}}^{s k i p} = W_{skip} \cdot y_{1 : L}^{T}$

where $W_{skip} \in ℝ^{H \times L}$ maps across the time dimension (applied after transposing target channels).

Final prediction:

$\hat{y} = {\hat{y}}^{t e m p} + {\hat{y}}^{s k i p}$

The additive combination of the dense encoder-decoder pathway and the direct skip connection allows the model to capture both complex nonlinear patterns and simple linear trends.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment