Principle:Sktime Pytorch forecasting Positional Encoding
| Knowledge Sources | |
|---|---|
| Domains | Time_Series, Forecasting, Deep_Learning, Embedding |
| Last Updated | 2026-02-08 09:00 GMT |
Overview
Position-aware embedding strategies for transformer-based time series models: sinusoidal positional encoding for absolute position information, patch-based encoder embedding with a learnable global token, and channel-independent inverted data embedding for exogenous variables.
Description
This principle covers three complementary embedding approaches used to prepare inputs for transformer-based forecasting models:
1. Sinusoidal Positional Embedding (PositionalEmbedding): Injects absolute position information into the representation using fixed (non-learnable) sinusoidal functions. Even-indexed dimensions use sine and odd-indexed dimensions use cosine, with frequencies decreasing geometrically across dimensions. The encoding is pre-computed for a maximum sequence length and stored as a non-trainable buffer. This allows the model to distinguish positions in the sequence without any learned parameters, and generalizes to unseen sequence lengths up to the pre-computed maximum.
2. Patch-Based Encoder Embedding (EnEmbedding): Designed for endogenous (target) variable embedding in the TimeXer architecture. The input time series is first permuted to a channel-first layout, then segmented into non-overlapping patches of fixed length using an unfold operation. Each patch is linearly projected to the model dimension. Sinusoidal positional encoding is added to convey the ordering of patches. A learnable global token is appended to the patch sequence for each variable; this token serves as an aggregation point that later participates in cross-attention to gather exogenous information. The output is reshaped so that all variables are processed as independent samples (channel independence).
3. Inverted Data Embedding (DataEmbedding_inverted): Embeds exogenous variables by treating each variable (channel) as a separate token whose feature vector spans the time dimension. The input is transposed from (Batch, Time, Channels) to (Batch, Channels, Time), and each channel-time vector is linearly projected to the model dimension. If time-stamp marks are available, they are concatenated with the variable channels before projection. This inverted perspective allows the transformer to capture inter-variable dependencies directly.
Usage
Use PositionalEmbedding whenever sequence order must be encoded in transformer inputs; it is used internally by EnEmbedding. Use EnEmbedding for the endogenous encoder path in TimeXer, configuring patch_len to control the granularity of temporal segmentation. Use DataEmbedding_inverted for embedding exogenous or cross-variable features in iTransformer-style and TimeXer architectures, passing optional x_mark timestamp features to enrich the representation.
Theoretical Basis
Sinusoidal Positional Encoding:
Where is the position index and is the dimension index. The wavelengths form a geometric progression from to .
Patch-Based Encoding:
Given input and patch length :
A global token is appended:
Inverted (Channel-Independent) Embedding:
Where projects each channel's time-series vector to the model dimension.
Pseudo-code for patch-based embedding:
# EnEmbedding forward pass (pseudo-code)
def en_embedding(x, patch_len):
x = x.permute(0, 2, 1) # (B, C, T)
x = unfold(x, size=patch_len, step=patch_len) # (B, C, N_patches, P)
x = reshape(x, (B*C, N_patches, P))
x = linear_value(x) + positional_encoding(x)
x = reshape(x, (B, C, N_patches, d_model))
x = concat(x, global_token) # append learnable token
x = reshape(x, (B*C, N_patches+1, d_model))
return dropout(x)