Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Sktime Pytorch forecasting TimeXer

From Leeroopedia


Knowledge Sources
Domains Time_Series, Forecasting, Deep_Learning
Last Updated 2026-02-08 08:00 GMT

Overview

TimeXer is a Transformer-based time series forecasting model that reconciles endogenous and exogenous variable information through patch-level and variate-level representations.

Description

TimeXer extends BaseModelWithCovariates and implements the Time Series Transformer with eXogenous variables architecture. It employs patch-level representations for endogenous variables and variate-level representations for exogenous variables, connected by an endogenous global token. The model uses a dual attention encoder with self-attention on endogenous patches and cross-attention for exogenous-to-endogenous correlations, followed by a flatten head for producing forecasts. It supports univariate (S), multivariate-to-single (MS), and multivariate (M) forecasting modes, as well as quantile loss for probabilistic predictions.

Usage

Use TimeXer when forecasting time series with exogenous (external) covariates available. It is particularly effective for long-term and short-term forecasting tasks where both endogenous temporal patterns and exogenous correlations need to be captured. The model can be instantiated directly or via the from_dataset class method using a TimeSeriesDataSet.

Code Reference

Source Location

Signature

class TimeXer(BaseModelWithCovariates):
    def __init__(
        self,
        context_length: int,
        prediction_length: int,
        task_name: str = "long_term_forecast",
        features: str = "MS",
        enc_in: int = None,
        hidden_size: int = 256,
        n_heads: int = 4,
        e_layers: int = 2,
        d_ff: int = 1024,
        dropout: float = 0.2,
        activation: str = "relu",
        use_efficient_attention: bool = False,
        patch_length: int = 16,
        factor: int = 5,
        embed_type: str = "fixed",
        freq: str = "h",
        output_size: int | list[int] = 1,
        loss: MultiHorizonMetric = None,
        learning_rate: float = 1e-3,
        static_categoricals: list[str] | None = None,
        static_reals: list[str] | None = None,
        time_varying_categoricals_encoder: list[str] | None = None,
        time_varying_categoricals_decoder: list[str] | None = None,
        time_varying_reals_encoder: list[str] | None = None,
        time_varying_reals_decoder: list[str] | None = None,
        x_reals: list[str] | None = None,
        x_categoricals: list[str] | None = None,
        embedding_sizes: dict[str, tuple[int, int]] | None = None,
        embedding_labels: list[str] | None = None,
        embedding_paddings: list[str] | None = None,
        categorical_groups: dict[str, list[str]] | None = None,
        logging_metrics: nn.ModuleList = None,
        **kwargs,
    ):

from_dataset

@classmethod
def from_dataset(
    cls,
    dataset: TimeSeriesDataSet,
    allowed_encoder_known_variable_names: list[str] = None,
    **kwargs,
):

forward

def forward(self, x: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]:

Import

from pytorch_forecasting.models.timexer import TimeXer

I/O Contract

Inputs

Name Type Required Description
context_length int Yes Length of input sequence used for making predictions
prediction_length int Yes Number of future time steps to predict
task_name str No Type of forecasting task: 'long_term_forecast' or 'short_term_forecast'
features str No Feature mode: 'MS' (multivariate-to-single), 'M' (multivariate), 'S' (univariate)
enc_in int No Number of input variables for encoder; defaults to number of real features
hidden_size int No Dimension of model embeddings and hidden representations (default 256)
n_heads int No Number of attention heads (default 4)
e_layers int No Number of encoder layers with dual attention (default 2)
d_ff int No Dimension of feedforward network in transformer layers (default 1024)
dropout float No Dropout rate (default 0.2)
activation str No Activation function: 'relu' or 'gelu' (default 'relu')
use_efficient_attention bool No Use PyTorch native optimized SDPA (default False)
patch_length int No Length of each non-overlapping patch for endogenous tokenization (default 16)
factor int No Scaling factor for attention scores (default 5)
embed_type str No Type of time feature embedding (default 'fixed')
freq str No Frequency of time series data (default 'h')
output_size int or list[int] No Output size (default 1)
loss MultiHorizonMetric No Loss function; defaults to MAE (or MultiLoss for 'M' mode)
learning_rate float No Learning rate (default 1e-3)
logging_metrics nn.ModuleList No Metrics logged during training; defaults to [SMAPE, MAE, RMSE, MAPE]

Outputs

Name Type Description
prediction dict[str, torch.Tensor] Network output dictionary containing 'prediction' tensor of shape (batch_size, prediction_length, n_quantiles) for single-target or list of tensors for multi-target

Usage Examples

from pytorch_forecasting import TimeSeriesDataSet
from pytorch_forecasting.models.timexer import TimeXer

# Create model from dataset
model = TimeXer.from_dataset(
    dataset,
    hidden_size=256,
    n_heads=4,
    e_layers=2,
    d_ff=1024,
    dropout=0.2,
    patch_length=16,
)

# Or instantiate directly
model = TimeXer(
    context_length=96,
    prediction_length=24,
    hidden_size=256,
    n_heads=4,
    e_layers=2,
    patch_length=16,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment