Principle:FlagOpen FlagEmbedding Distributed Embedder Training
Overview
A distributed training pipeline that fine-tunes BGE embedding models using contrastive learning with DeepSpeed, supporting encoder-only and decoder-only architectures with optional LoRA.
Description
The training pipeline uses torchrun for multi-GPU distributed training with DeepSpeed ZeRO optimization. Four training modules exist:
- Encoder-only base
- Full fine-tuning of encoder-only models.
- Encoder-only M3
- Unified dense+sparse+ColBERT loss.
- Decoder-only base
- LoRA fine-tuning of decoder-only models.
- Decoder-only ICL
- Fine-tuning with in-context learning examples.
The runner orchestrates: model/tokenizer loading, dataset creation, data collation (with sub-batching), and HuggingFace Trainer execution.
Usage
When fine-tuning a BGE embedding model on custom data with distributed training.
Theoretical Basis
Contrastive loss (InfoNCE):
L = -log(exp(sim(q, p+)/τ) / Σ exp(sim(q, p_i)/τ))
DeepSpeed ZeRO partitions optimizer states across GPUs to reduce memory footprint.
LoRA adds low-rank adapters to frozen model weights, enabling parameter-efficient fine-tuning.