Implementation:Sktime Pytorch forecasting Encoder
| Knowledge Sources | |
|---|---|
| Domains | Time_Series, Forecasting, Deep_Learning |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
Encoder is a stacked encoder module that sequentially applies a list of encoder layers with optional normalization and projection for the TimeXer model.
Description
The Encoder class wraps multiple EncoderLayer instances into a sequential stack, passing self-attention and cross-attention tensors through each layer in order. After all layers have been applied, it optionally applies a normalization layer and a projection layer. This modular design allows flexible construction of encoders with varying depths and configurations.
Usage
Use Encoder when assembling transformer-based time series forecasting models that require a multi-layer encoder with cross-attention support. It serves as the main encoder component in the TimeXer architecture, processing patch embeddings through stacked attention and feedforward layers.
Code Reference
Source Location
- Repository: Sktime_Pytorch_forecasting
- File: pytorch_forecasting/layers/_encoders/_encoder.py
- Lines: 1-40
Signature
class Encoder(nn.Module):
def __init__(self, layers, norm_layer=None, projection=None):
...
def forward(self, x, cross, x_mask=None, cross_mask=None, tau=None, delta=None):
...
Import
from pytorch_forecasting.layers import Encoder
I/O Contract
Inputs
__init__ Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| layers | list | Yes | List of encoder layer instances (e.g., EncoderLayer objects) to be stacked. |
| norm_layer | nn.Module | No | Optional normalization layer applied after all encoder layers. Defaults to None. |
| projection | nn.Module | No | Optional projection layer applied after normalization. Defaults to None. |
forward Parameters
| Name | Type | Required | Description |
|---|---|---|---|
| x | torch.Tensor | Yes | Self-attention input tensor (queries/keys/values for self-attention within each layer). |
| cross | torch.Tensor | Yes | Cross-attention input tensor (keys/values for cross-attention within each layer). |
| x_mask | torch.Tensor | No | Optional attention mask for self-attention. Defaults to None. |
| cross_mask | torch.Tensor | No | Optional attention mask for cross-attention. Defaults to None. |
| tau | float | No | Optional temperature parameter for attention scaling. Defaults to None. |
| delta | torch.Tensor | No | Optional positional delta parameter for cross-attention. Defaults to None. |
Outputs
| Name | Type | Description |
|---|---|---|
| x | torch.Tensor | Encoded output tensor after passing through all layers, normalization, and projection. |
Usage Examples
import torch
import torch.nn as nn
from pytorch_forecasting.layers import Encoder, EncoderLayer
from pytorch_forecasting.layers import AttentionLayer, FullAttention
d_model = 64
n_heads = 8
d_ff = 256
n_layers = 3
# Build encoder layers
encoder_layers = [
EncoderLayer(
self_attention=AttentionLayer(FullAttention(), d_model, n_heads),
cross_attention=AttentionLayer(FullAttention(), d_model, n_heads),
d_model=d_model,
d_ff=d_ff,
)
for _ in range(n_layers)
]
# Assemble encoder with layer normalization
encoder = Encoder(
layers=encoder_layers,
norm_layer=nn.LayerNorm(d_model),
)
# Self-attention input: (batch=16, seq_len=10, d_model=64)
x = torch.randn(16, 10, d_model)
# Cross-attention input: (batch=16, cross_len=96, d_model=64)
cross = torch.randn(16, 96, d_model)
output = encoder(x, cross)
print(output.shape) # torch.Size([16, 10, 64])