Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:FlagOpen FlagEmbedding Multi Retrieval Training

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Information Retrieval, Multi-Granularity Embedding
Last Updated 2026-02-09 00:00 GMT

Overview

Multi-granularity retrieval training that combines dense embeddings, sparse representations, and ColBERT-style token interactions in a unified model architecture for comprehensive text matching.

Description

This principle implements a sophisticated training approach that jointly optimizes three complementary retrieval methods within a single model. Dense retrieval captures semantic similarity through fixed-dimensional embeddings, sparse retrieval (lexical matching) identifies exact term overlap via learned importance weights, and ColBERT multi-vector representations enable fine-grained token-level interactions. The unified architecture, exemplified by BGE-M3, allows the model to leverage the strengths of each retrieval paradigm while sharing the underlying transformer encoder. This multi-task learning approach improves robustness across diverse retrieval scenarios, from semantic search to keyword matching.

Usage

Use this principle when:

  • Building versatile retrieval systems that handle both semantic and lexical queries
  • Training embedders for multilingual or cross-lingual retrieval
  • Developing models that need to balance precision (lexical) and recall (semantic)
  • Creating universal embedding models for production search systems

Theoretical Basis

The training objective combines three losses:

  1. Dense Loss: L_dense = -log(exp(sim(q, d+)) / Σ_i exp(sim(q, d_i)))
  2. Sparse Loss: L_sparse = -log(exp(Σ_t w_q(t) * w_d(t)) / Σ_i exp(...))
  3. ColBERT Loss: L_colbert = -log(exp(MaxSim(Q, D+)) / Σ_i exp(MaxSim(Q, D_i)))

Where MaxSim(Q, D) = Σ_q∈Q max_d∈D (q · d) computes maximum similarity between query and document token embeddings.

Total loss: L = α*L_dense + β*L_sparse + γ*L_colbert

The model shares the encoder but uses specialized projection heads for each retrieval type, enabling efficient multi-granularity representation learning.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment