Implementation:FlagOpen FlagEmbedding RerankerRunner Run
Appearance
| Source |
|---|
| FlagEmbedding/abc/finetune/reranker/AbsRunner.py:L135-143 (run method), full class L24-144 |
Summary
The run() method orchestrates the full reranker fine-tuning pipeline. It is defined in AbsRerankerRunner and inherited by concrete runners.
Concrete runners:
- EncoderOnlyRerankerRunner -- FlagEmbedding/finetune/reranker/encoder_only/base/runner.py
- DecoderOnlyRerankerRunner -- FlagEmbedding/finetune/reranker/decoder_only/base/runner.py
CLI
torchrun --nproc_per_node 4 \
-m FlagEmbedding.finetune.reranker.encoder_only.base \
--model_name_or_path BAAI/bge-reranker-v2-m3 \
--train_data ./train_data.jsonl \
--output_dir ./fine_tuned_reranker \
--learning_rate 5e-6 \
--num_train_epochs 5 \
--per_device_train_batch_size 16 \
--deepspeed ds_stage0.json
Arguments
AbsRerankerModelArguments
| Parameter | Description |
|---|---|
| model_name_or_path | Path or name of the pretrained reranker model |
| model_type | Model architecture type: encoder or decoder |
AbsRerankerDataArguments
| Parameter | Default | Description |
|---|---|---|
| train_data | Path to training JSONL file | |
| train_group_size | 8 | Number of passages per query in a training group |
| query_max_len | 32 | Maximum token length for queries |
| passage_max_len | 128 | Maximum token length for passages |
| max_len | 512 | Maximum combined token length |
| knowledge_distillation | Enable knowledge distillation mode | |
| sep_token | '\n' | Separator token between query and passage |
AbsRerankerTrainingArguments (extends TrainingArguments)
| Parameter | Description |
|---|---|
| sub_batch_size | Sub-batch size for gradient accumulation within reranker scoring |
I/O
Input:
- Training JSONL file
- Model checkpoint (pretrained or fine-tuned)
- DeepSpeed config JSON
Output:
- Fine-tuned reranker saved to output_dir
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment