Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FlagOpen FlagEmbedding AbsEvalArgs Configuration

From Leeroopedia


Type API Doc
Source FlagEmbedding/abc/evaluation/arguments.py: L9-78 (AbsEvalArgs), L81-191 (AbsEvalModelArgs)
Import from FlagEmbedding.abc.evaluation.arguments import AbsEvalArgs, AbsEvalModelArgs

AbsEvalArgs

Base dataclass for evaluation task arguments. Adapted from AIR-Bench evaluation utilities.

Signature

@dataclass
class AbsEvalArgs:
    eval_name: str = None
    dataset_dir: Optional[str] = None
    force_redownload: bool = False
    dataset_names: Optional[str] = None          # nargs="+"
    splits: str = "test"                          # nargs="+"
    corpus_embd_save_dir: str = None
    output_dir: str = "./search_results"
    search_top_k: int = 1000
    rerank_top_k: int = 100
    cache_path: str = None
    token: str = os.getenv('HF_TOKEN', None)
    overwrite: bool = False
    ignore_identical_ids: bool = False
    # ================ for evaluation ===============
    k_values: int = [1, 3, 5, 10, 100, 1000]     # nargs="+"
    eval_output_method: str = "markdown"           # choices: ["json", "markdown"]
    eval_output_path: str = "./eval_results.md"
    eval_metrics: str = ["ndcg_at_10", "recall_at_10"]  # nargs="+"

Fields

Field Type Default Description
eval_name str None The name of the evaluation task (e.g., msmarco, beir, miracl)
dataset_dir Optional[str] None Path to local dataset directory or download location. For custom datasets, must contain corpus.jsonl, <split>_queries.jsonl, <split>_qrels.jsonl
force_redownload bool False Whether to force redownload the dataset from remote
dataset_names Optional[str] None Names of datasets to evaluate (nargs="+"). If None, all available datasets are evaluated
splits str "test" Splits to evaluate (nargs="+")
corpus_embd_save_dir str None Path to save/load corpus embeddings. If None, embeddings are not saved
output_dir str "./search_results" Path to save search results
search_top_k int 1000 Top-k for retrieval stage
rerank_top_k int 100 Top-k for reranking stage
cache_path str None Cache directory for loading datasets
token str HF_TOKEN env var HuggingFace token for model/dataset access
overwrite bool False Whether to overwrite existing evaluation results
ignore_identical_ids bool False Whether to ignore identical query/doc IDs in search results
k_values List[int] [1, 3, 5, 10, 100, 1000] k values for metric computation (nargs="+")
eval_output_method str "markdown" Output format: "json" or "markdown"
eval_output_path str "./eval_results.md" Path to save evaluation results
eval_metrics List[str] ["ndcg_at_10", "recall_at_10"] Metrics to evaluate (nargs="+")

AbsEvalModelArgs

Base dataclass for model arguments during evaluation.

Signature

@dataclass
class AbsEvalModelArgs:
    # ================ embedder config ===============
    embedder_name_or_path: str                                  # required
    embedder_model_class: Optional[str] = None                  # choices: encoder-only-base, encoder-only-m3, decoder-only-base, decoder-only-icl
    normalize_embeddings: bool = True
    pooling_method: str = "cls"
    use_fp16: bool = True
    devices: Optional[str] = None                               # nargs="+"
    query_instruction_for_retrieval: Optional[str] = None
    query_instruction_format_for_retrieval: str = "{}{}"
    examples_for_task: Optional[str] = None
    examples_instruction_format: str = "{}{}"
    trust_remote_code: bool = False
    # ================ reranker config ===============
    reranker_name_or_path: Optional[str] = None
    reranker_model_class: Optional[str] = None                  # choices: encoder-only-base, decoder-only-base, decoder-only-layerwise, decoder-only-lightweight
    reranker_peft_path: Optional[str] = None
    use_bf16: bool = False
    query_instruction_for_rerank: Optional[str] = None
    query_instruction_format_for_rerank: str = "{}{}"
    passage_instruction_for_rerank: Optional[str] = None
    passage_instruction_format_for_rerank: str = "{}{}"
    cache_dir: str = None
    # ================ for inference ===============
    embedder_batch_size: int = 3000
    reranker_batch_size: int = 3000
    embedder_query_max_length: int = 512
    embedder_passage_max_length: int = 512
    reranker_query_max_length: Optional[int] = None
    reranker_max_length: int = 512
    normalize: bool = False
    prompt: Optional[str] = None
    cutoff_layers: List[int] = None
    compress_ratio: int = 1
    compress_layers: Optional[int] = None                       # nargs="+"

Fields

Field Type Default Description
embedder_name_or_path str required The embedder model name or path
embedder_model_class Optional[str] None Model class: encoder-only-base, encoder-only-m3, decoder-only-base, decoder-only-icl
normalize_embeddings bool True Whether to normalize the embeddings
pooling_method str "cls" Pooling method for the embedder
use_fp16 bool True Whether to use FP16 for inference
devices Optional[str] None Devices for inference (nargs="+")
query_instruction_for_retrieval Optional[str] None Instruction prepended to queries during retrieval
query_instruction_format_for_retrieval str "{}{}" Format template for query instruction
examples_for_task Optional[str] None In-context examples for ICL models
examples_instruction_format str "{}{}" Format template for examples
trust_remote_code bool False Whether to trust remote code
reranker_name_or_path Optional[str] None The reranker model name or path
reranker_model_class Optional[str] None Model class: encoder-only-base, decoder-only-base, decoder-only-layerwise, decoder-only-lightweight
reranker_peft_path Optional[str] None Path to PEFT adapter for reranker
use_bf16 bool False Whether to use BF16 for inference
query_instruction_for_rerank Optional[str] None Instruction prepended to queries during reranking
query_instruction_format_for_rerank str "{}{}" Format template for rerank query instruction
passage_instruction_for_rerank Optional[str] None Instruction prepended to passages during reranking
passage_instruction_format_for_rerank str "{}{}" Format template for rerank passage instruction
cache_dir str None Cache directory for models
embedder_batch_size int 3000 Batch size for embedder inference
reranker_batch_size int 3000 Batch size for reranker inference
embedder_query_max_length int 512 Max token length for queries
embedder_passage_max_length int 512 Max token length for passages
reranker_query_max_length Optional[int] None Max token length for reranker queries
reranker_max_length int 512 Max token length for reranking
normalize bool False Whether to normalize reranking scores
prompt Optional[str] None Prompt for the reranker
cutoff_layers List[int] None Output layers for layerwise/lightweight reranker
compress_ratio int 1 Compress ratio for lightweight reranker
compress_layers Optional[int] None Compress layers for lightweight reranker (nargs="+")

Post-initialization

The __post_init__ method replaces escaped newline sequences (\\n) with actual newlines in the instruction format fields: query_instruction_format_for_retrieval, examples_instruction_format, query_instruction_format_for_rerank, and passage_instruction_format_for_rerank.

Input / Output

Input: Command-line arguments parsed via HuggingFace's HfArgumentParser. Both dataclasses are typically parsed together:

from transformers import HfArgumentParser
from FlagEmbedding.abc.evaluation.arguments import AbsEvalArgs, AbsEvalModelArgs

parser = HfArgumentParser((AbsEvalArgs, AbsEvalModelArgs))
eval_args, model_args = parser.parse_args_into_dataclasses()

Output: Configured dataclass instances (AbsEvalArgs and AbsEvalModelArgs) ready to be passed to AbsEvalRunner.

Example

A typical evaluation shell script calling the BEIR benchmark:

#!/bin/bash

python -m FlagEmbedding.evaluation.beir \
    --eval_name beir \
    --dataset_dir ./data/beir \
    --dataset_names arguana fiqa \
    --splits test \
    --output_dir ./search_results \
    --search_top_k 1000 \
    --rerank_top_k 100 \
    --k_values 1 3 5 10 100 1000 \
    --eval_output_method markdown \
    --eval_output_path ./eval_results.md \
    --eval_metrics ndcg_at_10 recall_at_10 \
    --embedder_name_or_path BAAI/bge-base-en-v1.5 \
    --embedder_model_class encoder-only-base \
    --normalize_embeddings True \
    --embedder_batch_size 256 \
    --embedder_query_max_length 512 \
    --embedder_passage_max_length 512 \
    --reranker_name_or_path BAAI/bge-reranker-v2-m3 \
    --reranker_model_class encoder-only-base \
    --reranker_batch_size 256 \
    --reranker_max_length 512 \
    --devices cuda:0 cuda:1 \
    --use_fp16 True

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment