Principle:Lucidrains X transformers Multi Stream Input Fusion

Knowledge Sources	BERT CogView GLM 130B
Domains	Deep_Learning, Multi_Modal, Model_Architecture
Last Updated	2026-02-08 18:00 GMT

Overview

Technique that processes multiple named token input streams through a shared transformer by summing their embeddings, enabling multi-modal or multi-type sequence processing.

Description

Multi-Stream Input Fusion is an architecture pattern where multiple parallel token inputs (each with its own vocabulary and embedding table) are combined into a single representation by summing their embeddings. This is the same approach used in BERT for combining token embeddings with segment embeddings and position embeddings. Each input stream contributes additively to the final embedding, which is then processed by shared attention layers. The output can be projected back to separate logit spaces for each input stream. This pattern generalizes to any number of named input types, enabling flexible multi-modal or multi-annotation architectures.

Usage

Use this principle when designing transformer architectures that need to process multiple types of input tokens simultaneously at each position, such as token + type IDs (BERT-style), text + image patch tokens, or any multi-annotation scenario where each position has multiple categorical attributes.

Theoretical Basis

The combined embedding at each position:

$𝐞_{i} = \sum_{s \in streams} {Embed}_{s} (x_{i}^{s}) + PosEmbed (i)$

Pseudo-code Logic:

# Abstract algorithm (NOT real implementation)
combined_embedding = 0
for name, token_ids in named_inputs.items():
    combined_embedding += embedding_table[name](token_ids)

combined_embedding += positional_embedding
hidden = transformer(combined_embedding)

# Separate output heads
logits = {name: output_head[name](hidden) for name in named_inputs}

Related Pages

Implementation:Lucidrains_X_transformers_MultiInputTransformerWrapper

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment