Implementation:Sktime Pytorch forecasting TimeXer

Knowledge Sources	Sktime_Pytorch_forecasting
Domains	Time_Series, Forecasting, Deep_Learning
Last Updated	2026-02-08 08:00 GMT

Overview

TimeXer is a Transformer-based time series forecasting model that reconciles endogenous and exogenous variable information through patch-level and variate-level representations.

Description

TimeXer extends BaseModelWithCovariates and implements the Time Series Transformer with eXogenous variables architecture. It employs patch-level representations for endogenous variables and variate-level representations for exogenous variables, connected by an endogenous global token. The model uses a dual attention encoder with self-attention on endogenous patches and cross-attention for exogenous-to-endogenous correlations, followed by a flatten head for producing forecasts. It supports univariate (S), multivariate-to-single (MS), and multivariate (M) forecasting modes, as well as quantile loss for probabilistic predictions.

Usage

Use TimeXer when forecasting time series with exogenous (external) covariates available. It is particularly effective for long-term and short-term forecasting tasks where both endogenous temporal patterns and exogenous correlations need to be captured. The model can be instantiated directly or via the from_dataset class method using a TimeSeriesDataSet.

Code Reference

Source Location

Repository: Sktime_Pytorch_forecasting
File: pytorch_forecasting/models/timexer/_timexer.py
Lines: 1-496

Signature

class TimeXer(BaseModelWithCovariates):
    def __init__(
        self,
        context_length: int,
        prediction_length: int,
        task_name: str = "long_term_forecast",
        features: str = "MS",
        enc_in: int = None,
        hidden_size: int = 256,
        n_heads: int = 4,
        e_layers: int = 2,
        d_ff: int = 1024,
        dropout: float = 0.2,
        activation: str = "relu",
        use_efficient_attention: bool = False,
        patch_length: int = 16,
        factor: int = 5,
        embed_type: str = "fixed",
        freq: str = "h",
        output_size: int | list[int] = 1,
        loss: MultiHorizonMetric = None,
        learning_rate: float = 1e-3,
        static_categoricals: list[str] | None = None,
        static_reals: list[str] | None = None,
        time_varying_categoricals_encoder: list[str] | None = None,
        time_varying_categoricals_decoder: list[str] | None = None,
        time_varying_reals_encoder: list[str] | None = None,
        time_varying_reals_decoder: list[str] | None = None,
        x_reals: list[str] | None = None,
        x_categoricals: list[str] | None = None,
        embedding_sizes: dict[str, tuple[int, int]] | None = None,
        embedding_labels: list[str] | None = None,
        embedding_paddings: list[str] | None = None,
        categorical_groups: dict[str, list[str]] | None = None,
        logging_metrics: nn.ModuleList = None,
        **kwargs,
    ):

from_dataset

@classmethod
def from_dataset(
    cls,
    dataset: TimeSeriesDataSet,
    allowed_encoder_known_variable_names: list[str] = None,
    **kwargs,
):

forward

def forward(self, x: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]:

Import

from pytorch_forecasting.models.timexer import TimeXer

I/O Contract

Inputs

Name	Type	Required	Description
context_length	int	Yes	Length of input sequence used for making predictions
prediction_length	int	Yes	Number of future time steps to predict
task_name	str	No	Type of forecasting task: 'long_term_forecast' or 'short_term_forecast'
features	str	No	Feature mode: 'MS' (multivariate-to-single), 'M' (multivariate), 'S' (univariate)
enc_in	int	No	Number of input variables for encoder; defaults to number of real features
hidden_size	int	No	Dimension of model embeddings and hidden representations (default 256)
n_heads	int	No	Number of attention heads (default 4)
e_layers	int	No	Number of encoder layers with dual attention (default 2)
d_ff	int	No	Dimension of feedforward network in transformer layers (default 1024)
dropout	float	No	Dropout rate (default 0.2)
activation	str	No	Activation function: 'relu' or 'gelu' (default 'relu')
use_efficient_attention	bool	No	Use PyTorch native optimized SDPA (default False)
patch_length	int	No	Length of each non-overlapping patch for endogenous tokenization (default 16)
factor	int	No	Scaling factor for attention scores (default 5)
embed_type	str	No	Type of time feature embedding (default 'fixed')
freq	str	No	Frequency of time series data (default 'h')
output_size	int or list[int]	No	Output size (default 1)
loss	MultiHorizonMetric	No	Loss function; defaults to MAE (or MultiLoss for 'M' mode)
learning_rate	float	No	Learning rate (default 1e-3)
logging_metrics	nn.ModuleList	No	Metrics logged during training; defaults to [SMAPE, MAE, RMSE, MAPE]

Outputs

Name	Type	Description
prediction	dict[str, torch.Tensor]	Network output dictionary containing 'prediction' tensor of shape (batch_size, prediction_length, n_quantiles) for single-target or list of tensors for multi-target

Usage Examples

from pytorch_forecasting import TimeSeriesDataSet
from pytorch_forecasting.models.timexer import TimeXer

# Create model from dataset
model = TimeXer.from_dataset(
    dataset,
    hidden_size=256,
    n_heads=4,
    e_layers=2,
    d_ff=1024,
    dropout=0.2,
    patch_length=16,
)

# Or instantiate directly
model = TimeXer(
    context_length=96,
    prediction_length=24,
    hidden_size=256,
    n_heads=4,
    e_layers=2,
    patch_length=16,
)

Related Pages

Principle:Sktime_Pytorch_forecasting_TimeXer_Architecture

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment