Workflow:FlagOpen FlagEmbedding Reranker Finetuning
| Knowledge Sources | |
|---|---|
| Domains | Text_Reranking, Fine_Tuning, Information_Retrieval |
| Last Updated | 2026-02-09 21:30 GMT |
Overview
End-to-end process for fine-tuning a BGE reranker model on custom data, from data preparation through hard negative mining, optional knowledge distillation, and distributed training.
Description
This workflow covers the complete pipeline for adapting BGE reranker models to domain-specific relevance ranking tasks. It supports three model families: encoder-only cross-encoders (bge-reranker-base/large), LLM-based instruction-following rerankers (bge-reranker-v2-gemma), and layerwise depth-adaptive rerankers (bge-reranker-v2-minicpm-layerwise). Training uses a cross-encoder architecture where query and passage are concatenated and processed jointly. The pipeline reuses the same data preparation scripts as embedder finetuning.
Usage
Execute this workflow when you have labeled relevance data for query-passage pairs and need to train a more accurate reranker for your domain. Common scenarios include improving the second-stage ranking in a retrieve-then-rerank pipeline, particularly when domain-specific terminology or document structures differ from the pre-training distribution.
Execution Steps
Step 1: Install FlagEmbedding with Finetune Dependencies
Install the FlagEmbedding package with the finetune extras including DeepSpeed and flash-attention.
Key considerations:
- Use
pip install -U FlagEmbedding[finetune] - Multi-GPU training requires NCCL backend and torchrun launcher
Step 2: Prepare Training Data
Format training data as JSONL files where each line contains a query, positive passages, and negative passages. The reranker concatenates query and passage with a separator for joint encoding. An optional prompt field is appended after the passage.
Data format: Each line: {"query": str, "pos": List[str], "neg": List[str]}
Key considerations:
- The input format is the same as for embedder finetuning
- pos_scores and neg_scores are needed only for knowledge distillation
- The prompt field is appended as: query [SEP] passage [SEP] prompt
Step 3: Mine Hard Negatives
Use the hn_mine.py script to mine hard negatives with an existing embedder. The negatives are retrieved passages that are similar to the query but not relevant, providing strong training signal for the reranker.
Key considerations:
- Uses the same hn_mine.py script as embedder finetuning
- Harder negatives (lower range_for_sampling values) are particularly beneficial for rerankers
- The mined negatives are added to the neg field in the output JSONL
Step 4: Generate Teacher Scores (Optional)
Use add_reranker_score.py with a stronger teacher reranker to generate soft relevance labels for knowledge distillation training.
Key considerations:
- A larger or more capable reranker serves as the teacher
- Distillation scores enable softer training signal beyond binary relevance
- Set knowledge_distillation=True during training to utilize the scores
Step 5: Configure and Run Training
Set up training parameters and launch distributed training. Select the appropriate reranker module (encoder_only.base, decoder_only.base, or decoder_only.layerwise). Configure learning rate, batch size, sequence lengths, and DeepSpeed settings.
Key considerations:
- Launch with
torchrun --nproc_per_node N -m FlagEmbedding.finetune.reranker.{type} - Encoder-only rerankers use sequence classification loss (cross-entropy)
- Decoder-only rerankers use LoRA for parameter-efficient training
- Layerwise rerankers learn per-layer scoring heads for depth-adaptive inference
- DeepSpeed ZeRO Stage 0 for encoder-only; Stage 1 for LLM-based models
Step 6: Validate the Fine-tuned Reranker
Load the fine-tuned reranker using FlagAutoReranker.from_finetuned() and verify it produces reasonable relevance scores on held-out query-passage pairs. Compare score distributions and ranking quality before and after fine-tuning.
Key considerations:
- Use the same model_class parameter as the training module type
- For LoRA-trained models, use the merged model path
- Verify with normalize=True to check calibrated score distributions