Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:FlagOpen FlagEmbedding Contrastive Embedding Training

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Information Retrieval, Contrastive Learning, Text Embeddings
Last Updated 2026-02-09 00:00 GMT

Overview

Bi-encoder contrastive learning for text embeddings that trains separate encoders for queries and documents using in-batch negatives and hard negative mining to create discriminative representations.

Description

This principle forms the foundation of the BGE (BAAI General Embedding) family of models. It employs a dual-encoder architecture where queries and documents are independently encoded into a shared embedding space. Training uses contrastive learning with InfoNCE loss, treating other examples in the batch as negatives. Hard negative mining identifies challenging negative examples that are semantically similar but incorrect, forcing the model to learn fine-grained distinctions. The approach scales efficiently to large datasets through in-batch sampling and supports various retrieval tasks (web search, question answering, semantic similarity) through multi-task training. The resulting embeddings can be compared with simple cosine similarity at inference time, enabling fast retrieval with approximate nearest neighbor search.

Usage

Use this principle when:

  • Training general-purpose text embedding models
  • Building bi-encoder retrieval systems for semantic search
  • Creating embeddings for document clustering or similarity tasks
  • Developing foundational retrieval models that can be fine-tuned for specific domains

Theoretical Basis

The contrastive training framework consists of:

  1. Dual Encoders:
    • Query encoder: q = f_q(Q) where q ∈ R^d
    • Document encoder: d = f_d(D) where d ∈ R^d
    • Often f_q = f_d (shared encoder) for efficiency
  1. InfoNCE Loss:
    • Similarity score: s(q, d) = (q · d) / (||q|| ||d||)
    • Loss: L = -log(exp(s(q, d+)/τ) / (exp(s(q, d+)/τ) + Σ_i exp(s(q, d_i^-)/τ)))
    • Where d+ is positive document, d_i^- are negatives, τ is temperature
  1. Hard Negative Mining:
    • Retrieve challenging negatives: d^- = argmax_{d∈corpus, d≠d+} s(q, d)
    • Mix with random negatives for balanced training
    • Improves discriminative power
  1. Training Strategy:
    • Use large batch sizes (hundreds/thousands) for diverse negatives
    • Gradient accumulation for memory efficiency
    • Warm-up learning rate schedule
  1. Evaluation: Measure recall@k, MRR, nDCG on retrieval benchmarks like MSMARCO, BEIR

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment