Principle:FlagOpen FlagEmbedding Distributed Embedder Training

Overview

A distributed training pipeline that fine-tunes BGE embedding models using contrastive learning with DeepSpeed, supporting encoder-only and decoder-only architectures with optional LoRA.

Description

The training pipeline uses torchrun for multi-GPU distributed training with DeepSpeed ZeRO optimization. Four training modules exist:

Encoder-only base: Full fine-tuning of encoder-only models.
Encoder-only M3: Unified dense+sparse+ColBERT loss.
Decoder-only base: LoRA fine-tuning of decoder-only models.
Decoder-only ICL: Fine-tuning with in-context learning examples.

The runner orchestrates: model/tokenizer loading, dataset creation, data collation (with sub-batching), and HuggingFace Trainer execution.

Usage

When fine-tuning a BGE embedding model on custom data with distributed training.

Theoretical Basis

Contrastive loss (InfoNCE):

L = -log(exp(sim(q, p+)/τ) / Σ exp(sim(q, p_i)/τ))

DeepSpeed ZeRO partitions optimizer states across GPUs to reduce memory footprint.

LoRA adds low-rank adapters to frozen model weights, enabling parameter-efficient fine-tuning.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment