Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Sktime Pytorch forecasting EncoderLayer

From Leeroopedia


Knowledge Sources
Domains Time_Series, Forecasting, Deep_Learning
Last Updated 2026-02-08 08:00 GMT

Overview

EncoderLayer is a single encoder block that combines self-attention, cross-attention with a global token mechanism, and a position-wise feedforward network for the TimeXer model.

Description

The EncoderLayer class implements one layer of the TimeXer encoder. It first applies self-attention over the full input (including a global token), then extracts the global token and applies cross-attention between the global token and the cross (exogenous) input. The cross-attended global token is merged back into the sequence, and the result is passed through a two-layer 1D convolutional feedforward network with configurable activation (ReLU or GELU). Layer normalization and residual connections are applied after each sub-block.

Usage

Use EncoderLayer as a building block within the Encoder module for constructing multi-layer TimeXer-style encoders. Each layer refines the patch-level representations through self-attention and enriches the global token via cross-attention with exogenous features before applying a feedforward transformation.

Code Reference

Source Location

Signature

class EncoderLayer(nn.Module):
    def __init__(
        self,
        self_attention,
        cross_attention,
        d_model,
        d_ff=None,
        dropout=0.1,
        activation="relu",
    ):
        ...

    def forward(self, x, cross, x_mask=None, cross_mask=None, tau=None, delta=None):
        ...

Import

from pytorch_forecasting.layers import EncoderLayer

I/O Contract

Inputs

__init__ Parameters

Name Type Required Description
self_attention nn.Module Yes Self-attention mechanism (e.g., AttentionLayer wrapping FullAttention).
cross_attention nn.Module Yes Cross-attention mechanism for attending to exogenous features.
d_model int Yes Dimension of the model embedding space.
d_ff int No Dimension of the feedforward layer. Defaults to 4 * d_model if not specified.
dropout float No Dropout rate. Defaults to 0.1.
activation str No Activation function for the feedforward network: "relu" or "gelu". Defaults to "relu".

forward Parameters

Name Type Required Description
x torch.Tensor Yes Input tensor of shape (batch_size * n_vars, num_patches + 1, d_model), where the last position is the global token.
cross torch.Tensor Yes Cross-attention input (exogenous features) of shape (batch_size, cross_len, d_model).
x_mask torch.Tensor No Optional attention mask for self-attention. Defaults to None.
cross_mask torch.Tensor No Optional attention mask for cross-attention. Defaults to None.
tau float No Optional temperature parameter for attention scaling. Defaults to None.
delta torch.Tensor No Optional positional delta for cross-attention. Defaults to None.

Outputs

Name Type Description
output torch.Tensor Encoded output tensor of same shape as input x, with updated representations from self-attention, cross-attention, and feedforward processing.

Usage Examples

import torch
from pytorch_forecasting.layers import EncoderLayer, AttentionLayer, FullAttention

d_model = 64
n_heads = 8

# Create a single encoder layer
layer = EncoderLayer(
    self_attention=AttentionLayer(FullAttention(), d_model, n_heads),
    cross_attention=AttentionLayer(FullAttention(), d_model, n_heads),
    d_model=d_model,
    d_ff=256,
    dropout=0.1,
    activation="relu",
)

# Self-attention input: (batch * n_vars, num_patches + 1, d_model)
x = torch.randn(32, 7, d_model)     # e.g., 4 batches * 8 vars, 6 patches + 1 global token
cross = torch.randn(4, 96, d_model)  # exogenous features

output = layer(x, cross)
print(output.shape)  # torch.Size([32, 7, 64])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment