Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FlagOpen FlagEmbedding RerankerRunner Run

From Leeroopedia
Revision as of 15:00, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/FlagOpen_FlagEmbedding_RerankerRunner_Run.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Source
FlagEmbedding/abc/finetune/reranker/AbsRunner.py:L135-143 (run method), full class L24-144

Summary

The run() method orchestrates the full reranker fine-tuning pipeline. It is defined in AbsRerankerRunner and inherited by concrete runners.

Concrete runners:

  • EncoderOnlyRerankerRunner -- FlagEmbedding/finetune/reranker/encoder_only/base/runner.py
  • DecoderOnlyRerankerRunner -- FlagEmbedding/finetune/reranker/decoder_only/base/runner.py

CLI

torchrun --nproc_per_node 4 \
    -m FlagEmbedding.finetune.reranker.encoder_only.base \
    --model_name_or_path BAAI/bge-reranker-v2-m3 \
    --train_data ./train_data.jsonl \
    --output_dir ./fine_tuned_reranker \
    --learning_rate 5e-6 \
    --num_train_epochs 5 \
    --per_device_train_batch_size 16 \
    --deepspeed ds_stage0.json

Arguments

AbsRerankerModelArguments

Parameter Description
model_name_or_path Path or name of the pretrained reranker model
model_type Model architecture type: encoder or decoder

AbsRerankerDataArguments

Parameter Default Description
train_data Path to training JSONL file
train_group_size 8 Number of passages per query in a training group
query_max_len 32 Maximum token length for queries
passage_max_len 128 Maximum token length for passages
max_len 512 Maximum combined token length
knowledge_distillation Enable knowledge distillation mode
sep_token '\n' Separator token between query and passage

AbsRerankerTrainingArguments (extends TrainingArguments)

Parameter Description
sub_batch_size Sub-batch size for gradient accumulation within reranker scoring

I/O

Input:

  • Training JSONL file
  • Model checkpoint (pretrained or fine-tuned)
  • DeepSpeed config JSON

Output:

  • Fine-tuned reranker saved to output_dir

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment