Principle:Sktime Pytorch forecasting MLP Decoder

Knowledge Sources	pytorch-forecasting
Domains	Time_Series, Forecasting, Deep_Learning, Neural_Networks
Last Updated	2026-02-08 09:00 GMT

Overview

MLP-based decoder architecture for time series forecasting that applies a fully connected network independently at each decoder time step, using only decoder-available information (future-known covariates and static variables) to produce point or quantile predictions.

Description

The DecoderMLP model is a straightforward feedforward architecture that operates exclusively on information available at prediction time: decoder covariates (future-known time-varying variables) and static variables. Unlike encoder-decoder models such as TFT or recurrent networks, this model does not use any historical target values or encoder-period features. Instead, it applies the same MLP independently at each future time step, treating forecasting as a per-step regression from known covariates to target values.

The core building block is the FullyConnectedModule, a sequential stack of linear layers with configurable activation functions, dropout, and layer normalization. The architecture proceeds as follows: (1) an input layer maps the concatenated continuous and embedded categorical features to the hidden size, (2) a configurable number of hidden layers with activation, optional dropout, and optional LayerNorm process the representation, and (3) an output layer projects to the forecast dimension.

The input at each decoder time step is formed by concatenating the continuous decoder variables (at their original positions) with the embedded categorical variables from both decoder-period and static categories. This combined vector is flattened across the time dimension, passed through the MLP, and reshaped back to produce predictions for each horizon step.

The model supports multi-target forecasting by splitting the MLP output along the last dimension according to per-target output sizes. The default loss is QuantileLoss, making this model suitable for probabilistic forecasting out of the box.

Usage

Use the DecoderMLP when: (1) strong future-known covariates (e.g., calendar features, planned promotions) are the primary drivers of the forecast, (2) historical target patterns are less informative than exogenous variables, (3) a simple, fast baseline is needed before trying more complex architectures. The model is particularly useful for scenarios where the forecast depends mainly on known future events rather than on autoregressive patterns.

Theoretical Basis

Input construction at each decoder step t:

Failed to parse (syntax error): {\displaystyle x_t = \text{concat}(x^{cont}_t[\text{decoder\_reals}], \text{Embed}(x^{cat}_t)) }

where $x_{t}^{c o n t}$ contains the continuous features at decoder-relevant positions and $Embed (\cdot)$ maps categorical variables through learned embedding tables.

FullyConnectedModule forward pass:

$h_{0} = σ (W_{0} x_{t} + b_{0})$

$h_{l} = σ (W_{l} h_{l - 1} + b_{l}), l = 1, \dots, N$

${\hat{y}}_{t} = W_{o u t} h_{N} + b_{o u t}$

where $σ$ is the activation function (default ReLU), and each hidden layer optionally applies dropout and layer normalization:

$h_{l} \leftarrow LayerNorm (Dropout (σ (W_{l} h_{l - 1} + b_{l})))$

Per-step independence: The same MLP weights are shared across all decoder time steps. Each step is processed independently:

${\hat{y}}_{1 : H} = MLP (x_{1 : H})$

where the MLP is applied in a batched fashion over both the batch and time dimensions.

Multi-target output: For multiple targets, the output vector is split:

$[{\hat{y}}_{t}^{(1)}, \dots, {\hat{y}}_{t}^{(M)}] = split ({\hat{y}}_{t}, [s_{1}, \dots, s_{M}])$

where $s_{i}$ is the output size for target $i$ .

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment