Implementation:Sktime Pytorch forecasting Encoder

Knowledge Sources	Sktime_Pytorch_forecasting
Domains	Time_Series, Forecasting, Deep_Learning
Last Updated	2026-02-08 08:00 GMT

Overview

Encoder is a stacked encoder module that sequentially applies a list of encoder layers with optional normalization and projection for the TimeXer model.

Description

The Encoder class wraps multiple EncoderLayer instances into a sequential stack, passing self-attention and cross-attention tensors through each layer in order. After all layers have been applied, it optionally applies a normalization layer and a projection layer. This modular design allows flexible construction of encoders with varying depths and configurations.

Usage

Use Encoder when assembling transformer-based time series forecasting models that require a multi-layer encoder with cross-attention support. It serves as the main encoder component in the TimeXer architecture, processing patch embeddings through stacked attention and feedforward layers.

Code Reference

Source Location

Repository: Sktime_Pytorch_forecasting
File: pytorch_forecasting/layers/_encoders/_encoder.py
Lines: 1-40

Signature

class Encoder(nn.Module):
    def __init__(self, layers, norm_layer=None, projection=None):
        ...

    def forward(self, x, cross, x_mask=None, cross_mask=None, tau=None, delta=None):
        ...

Import

from pytorch_forecasting.layers import Encoder

I/O Contract

Inputs

init Parameters

Name	Type	Required	Description
layers	list	Yes	List of encoder layer instances (e.g., EncoderLayer objects) to be stacked.
norm_layer	nn.Module	No	Optional normalization layer applied after all encoder layers. Defaults to None.
projection	nn.Module	No	Optional projection layer applied after normalization. Defaults to None.

forward Parameters

Name	Type	Required	Description
x	torch.Tensor	Yes	Self-attention input tensor (queries/keys/values for self-attention within each layer).
cross	torch.Tensor	Yes	Cross-attention input tensor (keys/values for cross-attention within each layer).
x_mask	torch.Tensor	No	Optional attention mask for self-attention. Defaults to None.
cross_mask	torch.Tensor	No	Optional attention mask for cross-attention. Defaults to None.
tau	float	No	Optional temperature parameter for attention scaling. Defaults to None.
delta	torch.Tensor	No	Optional positional delta parameter for cross-attention. Defaults to None.

Outputs

Name	Type	Description
x	torch.Tensor	Encoded output tensor after passing through all layers, normalization, and projection.

Usage Examples

import torch
import torch.nn as nn
from pytorch_forecasting.layers import Encoder, EncoderLayer
from pytorch_forecasting.layers import AttentionLayer, FullAttention

d_model = 64
n_heads = 8
d_ff = 256
n_layers = 3

# Build encoder layers
encoder_layers = [
    EncoderLayer(
        self_attention=AttentionLayer(FullAttention(), d_model, n_heads),
        cross_attention=AttentionLayer(FullAttention(), d_model, n_heads),
        d_model=d_model,
        d_ff=d_ff,
    )
    for _ in range(n_layers)
]

# Assemble encoder with layer normalization
encoder = Encoder(
    layers=encoder_layers,
    norm_layer=nn.LayerNorm(d_model),
)

# Self-attention input: (batch=16, seq_len=10, d_model=64)
x = torch.randn(16, 10, d_model)
# Cross-attention input: (batch=16, cross_len=96, d_model=64)
cross = torch.randn(16, 96, d_model)

output = encoder(x, cross)
print(output.shape)  # torch.Size([16, 10, 64])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment