Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:FlagOpen FlagEmbedding Distributed Embedder Training

From Leeroopedia
Revision as of 17:53, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/FlagOpen_FlagEmbedding_Distributed_Embedder_Training.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Template:Metadata

Overview

A distributed training pipeline that fine-tunes BGE embedding models using contrastive learning with DeepSpeed, supporting encoder-only and decoder-only architectures with optional LoRA.

Description

The training pipeline uses torchrun for multi-GPU distributed training with DeepSpeed ZeRO optimization. Four training modules exist:

Encoder-only base
Full fine-tuning of encoder-only models.
Encoder-only M3
Unified dense+sparse+ColBERT loss.
Decoder-only base
LoRA fine-tuning of decoder-only models.
Decoder-only ICL
Fine-tuning with in-context learning examples.

The runner orchestrates: model/tokenizer loading, dataset creation, data collation (with sub-batching), and HuggingFace Trainer execution.

Usage

When fine-tuning a BGE embedding model on custom data with distributed training.

Theoretical Basis

Contrastive loss (InfoNCE):

L = -log(exp(sim(q, p+)/τ) / Σ exp(sim(q, p_i)/τ))

DeepSpeed ZeRO partitions optimizer states across GPUs to reduce memory footprint.

LoRA adds low-rank adapters to frozen model weights, enabling parameter-efficient fine-tuning.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment