Workflow:FlagOpen FlagEmbedding Reranker Finetuning

Knowledge Sources	FlagEmbedding BGE Documentation
Domains	Text_Reranking, Fine_Tuning, Information_Retrieval
Last Updated	2026-02-09 21:30 GMT

Overview

End-to-end process for fine-tuning a BGE reranker model on custom data, from data preparation through hard negative mining, optional knowledge distillation, and distributed training.

Description

This workflow covers the complete pipeline for adapting BGE reranker models to domain-specific relevance ranking tasks. It supports three model families: encoder-only cross-encoders (bge-reranker-base/large), LLM-based instruction-following rerankers (bge-reranker-v2-gemma), and layerwise depth-adaptive rerankers (bge-reranker-v2-minicpm-layerwise). Training uses a cross-encoder architecture where query and passage are concatenated and processed jointly. The pipeline reuses the same data preparation scripts as embedder finetuning.

Usage

Execute this workflow when you have labeled relevance data for query-passage pairs and need to train a more accurate reranker for your domain. Common scenarios include improving the second-stage ranking in a retrieve-then-rerank pipeline, particularly when domain-specific terminology or document structures differ from the pre-training distribution.

Execution Steps

Step 1: Install FlagEmbedding with Finetune Dependencies

Install the FlagEmbedding package with the finetune extras including DeepSpeed and flash-attention.

Key considerations:

Use pip install -U FlagEmbedding[finetune]
Multi-GPU training requires NCCL backend and torchrun launcher

Step 2: Prepare Training Data

Format training data as JSONL files where each line contains a query, positive passages, and negative passages. The reranker concatenates query and passage with a separator for joint encoding. An optional prompt field is appended after the passage.

Data format: Each line: {"query": str, "pos": List[str], "neg": List[str]}

Key considerations:

The input format is the same as for embedder finetuning
pos_scores and neg_scores are needed only for knowledge distillation
The prompt field is appended as: query [SEP] passage [SEP] prompt

Step 3: Mine Hard Negatives

Use the hn_mine.py script to mine hard negatives with an existing embedder. The negatives are retrieved passages that are similar to the query but not relevant, providing strong training signal for the reranker.

Key considerations:

Uses the same hn_mine.py script as embedder finetuning
Harder negatives (lower range_for_sampling values) are particularly beneficial for rerankers
The mined negatives are added to the neg field in the output JSONL

Step 4: Generate Teacher Scores (Optional)

Use add_reranker_score.py with a stronger teacher reranker to generate soft relevance labels for knowledge distillation training.

Key considerations:

A larger or more capable reranker serves as the teacher
Distillation scores enable softer training signal beyond binary relevance
Set knowledge_distillation=True during training to utilize the scores

Step 5: Configure and Run Training

Set up training parameters and launch distributed training. Select the appropriate reranker module (encoder_only.base, decoder_only.base, or decoder_only.layerwise). Configure learning rate, batch size, sequence lengths, and DeepSpeed settings.

Key considerations:

Launch with torchrun --nproc_per_node N -m FlagEmbedding.finetune.reranker.{type}
Encoder-only rerankers use sequence classification loss (cross-entropy)
Decoder-only rerankers use LoRA for parameter-efficient training
Layerwise rerankers learn per-layer scoring heads for depth-adaptive inference
DeepSpeed ZeRO Stage 0 for encoder-only; Stage 1 for LLM-based models

Step 6: Validate the Fine-tuned Reranker

Load the fine-tuned reranker using FlagAutoReranker.from_finetuned() and verify it produces reasonable relevance scores on held-out query-passage pairs. Compare score distributions and ranking quality before and after fine-tuning.

Key considerations:

Use the same model_class parameter as the training module type
For LoRA-trained models, use the merged model path
Verify with normalize=True to check calibrated score distributions

Execution Diagram

GitHub URL

Workflow Repository