Principle:Sktime Pytorch forecasting Categorical Variable Embedding

Knowledge Sources	pytorch-forecasting
Domains	Time_Series, Forecasting, Deep_Learning, Embedding, Feature_Engineering
Last Updated	2026-02-08 09:00 GMT

Overview

Embedding network for categorical variables that maps each discrete variable through a separate learned embedding table and augments the input with positional and temporal indicator variables, producing a dense feature tensor for downstream forecasting models.

Description

Categorical Variable Embedding transforms discrete categorical inputs into dense, continuous vector representations suitable for neural network processing. Each categorical variable is assigned its own nn.Embedding layer with a vocabulary size matching the cardinality of that variable and a shared output dimension (d_model).

A distinctive feature of this module is automatic positional augmentation. Three additional positional categorical variables are generated on the fly and concatenated with the user-provided categorical variables before embedding:

1. pos_seq (Sequence Position): An integer index from 0 to seq_len - 1 assigned to each time step, encoding absolute position within the full sequence (past + future).

2. pos_fut (Future Position): A counter that is 0 for all past time steps and counts from 1 to lag for future steps, encoding relative position within the forecast horizon.

3. is_fut (Future Indicator): A binary flag (0 for past, 1 for future) that explicitly marks whether each time step belongs to the historical context or the prediction horizon.

Each variable (both user-provided and positional) is independently embedded through its own embedding layer, then the embeddings are stacked along a new dimension, yielding a 4D tensor of shape (batch, seq_len, num_vars + 3, d_model). This structure allows downstream layers to attend separately over the variable dimension and the time dimension.

The module handles the edge case where no external categorical variables are provided (input is an integer batch size), in which case only the three positional variables are embedded.

Usage

Use embedding_cat_variables in architectures like DSIPTs that require rich categorical feature representations. Provide emb_dims as a list of vocabulary sizes for each user-defined categorical variable. The seq_len and lag parameters define the total sequence length and forecast horizon, which determine the vocabulary sizes of the three auto-generated positional variables. The output tensor can be summed, concatenated, or attended over in subsequent layers.

Theoretical Basis

Embedding Lookup:

For a categorical variable $c$ with vocabulary size $V_{c}$ :

$e_{c} = Embedding (c) \in ℝ^{d_{model}}$

Where the embedding table $W_{c} \in ℝ^{V_{c} \times d_{model}}$ is learned during training.

Positional Augmentation Variables:

Failed to parse (syntax error): {\displaystyle \text{pos\_seq}_t = t, \quad t \in \{0, 1, \ldots, T-1\} }

Failed to parse (syntax error): {\displaystyle \text{pos\_fut}_t = \begin{cases} 0 & \text{if } t < T - H \\ t - (T - H) + 1 & \text{if } t \geq T - H \end{cases} }

Failed to parse (syntax error): {\displaystyle \text{is\_fut}_t = \begin{cases} 0 & \text{if } t < T - H \\ 1 & \text{if } t \geq T - H \end{cases} }

Where $T$ is the total sequence length and $H$ is the forecast horizon (lag).

Combined Output:

Given $M$ user-defined categorical variables plus 3 positional variables:

Failed to parse (syntax error): {\displaystyle E = \text{Stack}(e_1, e_2, \ldots, e_M, e_{\text{pos\_seq}}, e_{\text{pos\_fut}}, e_{\text{is\_fut}}) \in \mathbb{R}^{B \times T \times (M+3) \times d_{\text{model}}} }

Pseudo-code:

# Categorical variable embedding (pseudo-code)
def embed_categorical(x, seq_len, lag):
    pos_seq = arange(0, seq_len)
    pos_fut = concat(zeros(seq_len - lag), arange(1, lag + 1))
    is_fut = concat(zeros(seq_len - lag), ones(lag))

    cat_vars = concat(x, pos_seq, pos_fut, is_fut, dim=-1)

    embeddings = []
    for i, embed_layer in enumerate(embedding_layers):
        embeddings.append(embed_layer(cat_vars[:, :, i]))
    return stack(embeddings, dim=2)

Related Pages

Implemented By

Implementation:Sktime_Pytorch_forecasting_Embedding_Cat_Variables

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment