Implementation:Sktime Pytorch forecasting TimeXer
| Knowledge Sources | |
|---|---|
| Domains | Time_Series, Forecasting, Deep_Learning |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
TimeXer is a Transformer-based time series forecasting model that reconciles endogenous and exogenous variable information through patch-level and variate-level representations.
Description
TimeXer extends BaseModelWithCovariates and implements the Time Series Transformer with eXogenous variables architecture. It employs patch-level representations for endogenous variables and variate-level representations for exogenous variables, connected by an endogenous global token. The model uses a dual attention encoder with self-attention on endogenous patches and cross-attention for exogenous-to-endogenous correlations, followed by a flatten head for producing forecasts. It supports univariate (S), multivariate-to-single (MS), and multivariate (M) forecasting modes, as well as quantile loss for probabilistic predictions.
Usage
Use TimeXer when forecasting time series with exogenous (external) covariates available. It is particularly effective for long-term and short-term forecasting tasks where both endogenous temporal patterns and exogenous correlations need to be captured. The model can be instantiated directly or via the from_dataset class method using a TimeSeriesDataSet.
Code Reference
Source Location
- Repository: Sktime_Pytorch_forecasting
- File: pytorch_forecasting/models/timexer/_timexer.py
- Lines: 1-496
Signature
class TimeXer(BaseModelWithCovariates):
def __init__(
self,
context_length: int,
prediction_length: int,
task_name: str = "long_term_forecast",
features: str = "MS",
enc_in: int = None,
hidden_size: int = 256,
n_heads: int = 4,
e_layers: int = 2,
d_ff: int = 1024,
dropout: float = 0.2,
activation: str = "relu",
use_efficient_attention: bool = False,
patch_length: int = 16,
factor: int = 5,
embed_type: str = "fixed",
freq: str = "h",
output_size: int | list[int] = 1,
loss: MultiHorizonMetric = None,
learning_rate: float = 1e-3,
static_categoricals: list[str] | None = None,
static_reals: list[str] | None = None,
time_varying_categoricals_encoder: list[str] | None = None,
time_varying_categoricals_decoder: list[str] | None = None,
time_varying_reals_encoder: list[str] | None = None,
time_varying_reals_decoder: list[str] | None = None,
x_reals: list[str] | None = None,
x_categoricals: list[str] | None = None,
embedding_sizes: dict[str, tuple[int, int]] | None = None,
embedding_labels: list[str] | None = None,
embedding_paddings: list[str] | None = None,
categorical_groups: dict[str, list[str]] | None = None,
logging_metrics: nn.ModuleList = None,
**kwargs,
):
from_dataset
@classmethod
def from_dataset(
cls,
dataset: TimeSeriesDataSet,
allowed_encoder_known_variable_names: list[str] = None,
**kwargs,
):
forward
def forward(self, x: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]:
Import
from pytorch_forecasting.models.timexer import TimeXer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| context_length | int | Yes | Length of input sequence used for making predictions |
| prediction_length | int | Yes | Number of future time steps to predict |
| task_name | str | No | Type of forecasting task: 'long_term_forecast' or 'short_term_forecast' |
| features | str | No | Feature mode: 'MS' (multivariate-to-single), 'M' (multivariate), 'S' (univariate) |
| enc_in | int | No | Number of input variables for encoder; defaults to number of real features |
| hidden_size | int | No | Dimension of model embeddings and hidden representations (default 256) |
| n_heads | int | No | Number of attention heads (default 4) |
| e_layers | int | No | Number of encoder layers with dual attention (default 2) |
| d_ff | int | No | Dimension of feedforward network in transformer layers (default 1024) |
| dropout | float | No | Dropout rate (default 0.2) |
| activation | str | No | Activation function: 'relu' or 'gelu' (default 'relu') |
| use_efficient_attention | bool | No | Use PyTorch native optimized SDPA (default False) |
| patch_length | int | No | Length of each non-overlapping patch for endogenous tokenization (default 16) |
| factor | int | No | Scaling factor for attention scores (default 5) |
| embed_type | str | No | Type of time feature embedding (default 'fixed') |
| freq | str | No | Frequency of time series data (default 'h') |
| output_size | int or list[int] | No | Output size (default 1) |
| loss | MultiHorizonMetric | No | Loss function; defaults to MAE (or MultiLoss for 'M' mode) |
| learning_rate | float | No | Learning rate (default 1e-3) |
| logging_metrics | nn.ModuleList | No | Metrics logged during training; defaults to [SMAPE, MAE, RMSE, MAPE] |
Outputs
| Name | Type | Description |
|---|---|---|
| prediction | dict[str, torch.Tensor] | Network output dictionary containing 'prediction' tensor of shape (batch_size, prediction_length, n_quantiles) for single-target or list of tensors for multi-target |
Usage Examples
from pytorch_forecasting import TimeSeriesDataSet
from pytorch_forecasting.models.timexer import TimeXer
# Create model from dataset
model = TimeXer.from_dataset(
dataset,
hidden_size=256,
n_heads=4,
e_layers=2,
d_ff=1024,
dropout=0.2,
patch_length=16,
)
# Or instantiate directly
model = TimeXer(
context_length=96,
prediction_length=24,
hidden_size=256,
n_heads=4,
e_layers=2,
patch_length=16,
)