Principle:FlagOpen FlagEmbedding Multi Retrieval Training
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Information Retrieval, Multi-Granularity Embedding |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Multi-granularity retrieval training that combines dense embeddings, sparse representations, and ColBERT-style token interactions in a unified model architecture for comprehensive text matching.
Description
This principle implements a sophisticated training approach that jointly optimizes three complementary retrieval methods within a single model. Dense retrieval captures semantic similarity through fixed-dimensional embeddings, sparse retrieval (lexical matching) identifies exact term overlap via learned importance weights, and ColBERT multi-vector representations enable fine-grained token-level interactions. The unified architecture, exemplified by BGE-M3, allows the model to leverage the strengths of each retrieval paradigm while sharing the underlying transformer encoder. This multi-task learning approach improves robustness across diverse retrieval scenarios, from semantic search to keyword matching.
Usage
Use this principle when:
- Building versatile retrieval systems that handle both semantic and lexical queries
- Training embedders for multilingual or cross-lingual retrieval
- Developing models that need to balance precision (lexical) and recall (semantic)
- Creating universal embedding models for production search systems
Theoretical Basis
The training objective combines three losses:
- Dense Loss: L_dense = -log(exp(sim(q, d+)) / Σ_i exp(sim(q, d_i)))
- Sparse Loss: L_sparse = -log(exp(Σ_t w_q(t) * w_d(t)) / Σ_i exp(...))
- ColBERT Loss: L_colbert = -log(exp(MaxSim(Q, D+)) / Σ_i exp(MaxSim(Q, D_i)))
Where MaxSim(Q, D) = Σ_q∈Q max_d∈D (q · d) computes maximum similarity between query and document token embeddings.
Total loss: L = α*L_dense + β*L_sparse + γ*L_colbert
The model shares the encoder but uses specialized projection heads for each retrieval type, enabling efficient multi-granularity representation learning.