Principle:FlagOpen FlagEmbedding Text Embedding Encoding
| Field | Value |
|---|---|
| sources | Paper: BGE Embeddings https://arxiv.org/abs/2309.07597, Paper: BGE M3 https://arxiv.org/abs/2402.03216 |
| domains | NLP, Information_Retrieval |
| last_updated | 2026-02-09 00:00 GMT |
Overview
A technique that converts text strings into fixed-dimensional dense vector representations using pre-trained Transformer models, enabling semantic similarity computation.
Description
Text embedding encoding transforms natural language into continuous vector spaces where semantically similar texts are close together. Different encoding methods exist:
- Query encoding with task-specific instructions -- prefixes queries with a retrieval instruction to align the embedding with the search task.
- Corpus/passage encoding without instructions -- encodes documents directly without instruction prefixing.
- General encoding -- a unified method that optionally applies an instruction string to any input.
Multi-device parallelization distributes encoding across GPUs for throughput. M3 models produce three types of output:
- Dense vectors -- fixed-dimensional continuous representations
- Sparse lexical weights -- term-level importance scores for hybrid retrieval
- ColBERT multi-vector representations -- token-level embeddings for late interaction
Usage
When converting text to embeddings for retrieval, semantic search, clustering, or similarity computation.
Theoretical Basis
Dual-encoder architecture. Queries and passages are encoded independently, enabling pre-computation of corpus embeddings for efficient retrieval. The encode method applies the following pipeline:
- Instruction prefixing (for queries) -- prepends a task-specific instruction to guide the model
- Tokenization -- converts text to token IDs using the model's tokenizer
- Forward pass through the Transformer -- produces contextual token representations
- Pooling (CLS / mean / last_token) -- aggregates token representations into a single vector
- Optional normalization -- L2-normalizes the output vector for cosine similarity
Multi-GPU encoding uses process pools that distribute batches across devices for parallel encoding, improving throughput for large-scale workloads.