Principle:FlagOpen FlagEmbedding Text Embedding Encoding

Field	Value
sources	Paper: BGE Embeddings https://arxiv.org/abs/2309.07597, Paper: BGE M3 https://arxiv.org/abs/2402.03216
domains	NLP, Information_Retrieval
last_updated	2026-02-09 00:00 GMT

Overview

A technique that converts text strings into fixed-dimensional dense vector representations using pre-trained Transformer models, enabling semantic similarity computation.

Description

Text embedding encoding transforms natural language into continuous vector spaces where semantically similar texts are close together. Different encoding methods exist:

Query encoding with task-specific instructions -- prefixes queries with a retrieval instruction to align the embedding with the search task.
Corpus/passage encoding without instructions -- encodes documents directly without instruction prefixing.
General encoding -- a unified method that optionally applies an instruction string to any input.

Multi-device parallelization distributes encoding across GPUs for throughput. M3 models produce three types of output:

Dense vectors -- fixed-dimensional continuous representations
Sparse lexical weights -- term-level importance scores for hybrid retrieval
ColBERT multi-vector representations -- token-level embeddings for late interaction

Usage

When converting text to embeddings for retrieval, semantic search, clustering, or similarity computation.

Theoretical Basis

Dual-encoder architecture. Queries and passages are encoded independently, enabling pre-computation of corpus embeddings for efficient retrieval. The encode method applies the following pipeline:

Instruction prefixing (for queries) -- prepends a task-specific instruction to guide the model
Tokenization -- converts text to token IDs using the model's tokenizer
Forward pass through the Transformer -- produces contextual token representations
Pooling (CLS / mean / last_token) -- aggregates token representations into a single vector
Optional normalization -- L2-normalizes the output vector for cosine similarity

Multi-GPU encoding uses process pools that distribute batches across devices for parallel encoding, improving throughput for large-scale workloads.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment