Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Sktime Pytorch forecasting Positional Encoding

From Leeroopedia


Knowledge Sources
Domains Time_Series, Forecasting, Deep_Learning, Embedding
Last Updated 2026-02-08 09:00 GMT

Overview

Position-aware embedding strategies for transformer-based time series models: sinusoidal positional encoding for absolute position information, patch-based encoder embedding with a learnable global token, and channel-independent inverted data embedding for exogenous variables.

Description

This principle covers three complementary embedding approaches used to prepare inputs for transformer-based forecasting models:

1. Sinusoidal Positional Embedding (PositionalEmbedding): Injects absolute position information into the representation using fixed (non-learnable) sinusoidal functions. Even-indexed dimensions use sine and odd-indexed dimensions use cosine, with frequencies decreasing geometrically across dimensions. The encoding is pre-computed for a maximum sequence length and stored as a non-trainable buffer. This allows the model to distinguish positions in the sequence without any learned parameters, and generalizes to unseen sequence lengths up to the pre-computed maximum.

2. Patch-Based Encoder Embedding (EnEmbedding): Designed for endogenous (target) variable embedding in the TimeXer architecture. The input time series is first permuted to a channel-first layout, then segmented into non-overlapping patches of fixed length using an unfold operation. Each patch is linearly projected to the model dimension. Sinusoidal positional encoding is added to convey the ordering of patches. A learnable global token is appended to the patch sequence for each variable; this token serves as an aggregation point that later participates in cross-attention to gather exogenous information. The output is reshaped so that all variables are processed as independent samples (channel independence).

3. Inverted Data Embedding (DataEmbedding_inverted): Embeds exogenous variables by treating each variable (channel) as a separate token whose feature vector spans the time dimension. The input is transposed from (Batch, Time, Channels) to (Batch, Channels, Time), and each channel-time vector is linearly projected to the model dimension. If time-stamp marks are available, they are concatenated with the variable channels before projection. This inverted perspective allows the transformer to capture inter-variable dependencies directly.

Usage

Use PositionalEmbedding whenever sequence order must be encoded in transformer inputs; it is used internally by EnEmbedding. Use EnEmbedding for the endogenous encoder path in TimeXer, configuring patch_len to control the granularity of temporal segmentation. Use DataEmbedding_inverted for embedding exogenous or cross-variable features in iTransformer-style and TimeXer architectures, passing optional x_mark timestamp features to enrich the representation.

Theoretical Basis

Sinusoidal Positional Encoding:

PE(pos,2i)=sin(pos100002i/dmodel)

PE(pos,2i+1)=cos(pos100002i/dmodel)

Where pos is the position index and i is the dimension index. The wavelengths form a geometric progression from 2π to 100002π.

Patch-Based Encoding:

Given input xB×T×C and patch length P:

Npatches=T/P

xpatchesBC×Npatches×P

e=Wvalxpatches+PE

A global token g1×C×1×dmodel is appended:

efull=Concat(e,g)

Inverted (Channel-Independent) Embedding:

x=xB×C×T

e=Wvalx+b

Where WvalT×dmodel projects each channel's time-series vector to the model dimension.

Pseudo-code for patch-based embedding:

# EnEmbedding forward pass (pseudo-code)
def en_embedding(x, patch_len):
    x = x.permute(0, 2, 1)                  # (B, C, T)
    x = unfold(x, size=patch_len, step=patch_len)  # (B, C, N_patches, P)
    x = reshape(x, (B*C, N_patches, P))
    x = linear_value(x) + positional_encoding(x)
    x = reshape(x, (B, C, N_patches, d_model))
    x = concat(x, global_token)             # append learnable token
    x = reshape(x, (B*C, N_patches+1, d_model))
    return dropout(x)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment