Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FlagOpen FlagEmbedding BGE M3 Run

From Leeroopedia
Revision as of 14:58, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/FlagOpen_FlagEmbedding_BGE_M3_Run.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Embedding Training, Multi-Vector Retrieval, Cross-Device Training
Last Updated 2026-02-09 00:00 GMT

Overview

Training script for BGE-M3 model with unified fine-tuning and cross-device negative sampling.

Description

This is the main training script for the BGE-M3 (Multi-Functionality, Multi-Linguality, Multi-Granularity) embedding model. It implements distributed training with cross-device negatives for contrastive learning, same-task batching for consistent training signal, unified fine-tuning supporting dense, sparse (lexical), and multi-vector (ColBERT) representations, self-distillation from teacher model, and dynamic data refresh at each epoch. The script uses custom BiTrainer with support for sub-batch processing, integrates with HuggingFace Transformers training infrastructure, and includes position embedding freezing options for length extrapolation.

Usage

Use this script when training the BGE-M3 model from scratch or fine-tuning on new data, implementing multi-representation retrieval systems (dense + sparse + multi-vector), and conducting distributed training experiments with cross-device negative sampling. The script is designed for large-scale embedding model training with advanced features.

Code Reference

Source Location

Signature

def main():
    """Main training function for BGE-M3 model"""

class TrainerCallbackForDataRefresh(TrainerCallback):
    def __init__(self, train_dataset):
        pass

    def on_epoch_end(self, args: TrainingArguments, state: TrainerState, control: TrainerControl, **kwargs):
        """Refresh dataset at end of each epoch"""

Import

# Run as script
# python run.py --model_name_or_path BAAI/bge-m3 --output_dir ./output

I/O Contract

Inputs

Name Type Required Description
model_name_or_path str Yes Pre-trained model name or path
train_data str Yes Training data path (file or directory)
output_dir str Yes Directory to save model checkpoints
negatives_cross_device bool No Enable cross-device negative sampling
unified_finetuning bool No Enable unified multi-representation training
use_self_distill bool No Enable self-distillation
temperature float No Temperature for contrastive loss
query_max_len int No Maximum query length
passage_max_len int No Maximum passage length
fix_position_embedding bool No Freeze position embeddings

Outputs

Name Type Description
trained_model Model Fine-tuned BGE-M3 model saved to output_dir
tokenizer Tokenizer Saved tokenizer
training_logs Logs Training metrics and checkpoints

Usage Examples

# Example 1: Basic training command
# python run.py \
#   --model_name_or_path BAAI/bge-m3 \
#   --train_data ./train_data \
#   --output_dir ./output_m3 \
#   --per_device_train_batch_size 32 \
#   --learning_rate 1e-5 \
#   --num_train_epochs 3 \
#   --query_max_len 512 \
#   --passage_max_len 512 \
#   --temperature 0.02 \
#   --negatives_cross_device \
#   --unified_finetuning

# Example 2: Advanced training with self-distillation
# python run.py \
#   --model_name_or_path BAAI/bge-m3 \
#   --train_data ./train_data \
#   --output_dir ./output_m3_distill \
#   --negatives_cross_device \
#   --unified_finetuning \
#   --use_self_distill \
#   --self_distill_start_step 1000 \
#   --colbert_dim 1024 \
#   --temperature 0.02 \
#   --fix_position_embedding \
#   --gradient_checkpointing

# Example 3: Multi-GPU distributed training
# torchrun --nproc_per_node 4 run.py \
#   --model_name_or_path BAAI/bge-m3 \
#   --train_data ./train_data \
#   --output_dir ./output_m3_dist \
#   --per_device_train_batch_size 64 \
#   --negatives_cross_device \
#   --unified_finetuning \
#   --fp16 \
#   --dataloader_num_workers 8

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment