Implementation:FlagOpen FlagEmbedding AbsEvalArgs Configuration
| Type | API Doc |
|---|---|
| Source | FlagEmbedding/abc/evaluation/arguments.py: L9-78 (AbsEvalArgs), L81-191 (AbsEvalModelArgs)
|
| Import | from FlagEmbedding.abc.evaluation.arguments import AbsEvalArgs, AbsEvalModelArgs
|
AbsEvalArgs
Base dataclass for evaluation task arguments. Adapted from AIR-Bench evaluation utilities.
Signature
@dataclass
class AbsEvalArgs:
eval_name: str = None
dataset_dir: Optional[str] = None
force_redownload: bool = False
dataset_names: Optional[str] = None # nargs="+"
splits: str = "test" # nargs="+"
corpus_embd_save_dir: str = None
output_dir: str = "./search_results"
search_top_k: int = 1000
rerank_top_k: int = 100
cache_path: str = None
token: str = os.getenv('HF_TOKEN', None)
overwrite: bool = False
ignore_identical_ids: bool = False
# ================ for evaluation ===============
k_values: int = [1, 3, 5, 10, 100, 1000] # nargs="+"
eval_output_method: str = "markdown" # choices: ["json", "markdown"]
eval_output_path: str = "./eval_results.md"
eval_metrics: str = ["ndcg_at_10", "recall_at_10"] # nargs="+"
Fields
| Field | Type | Default | Description |
|---|---|---|---|
| eval_name | str |
None |
The name of the evaluation task (e.g., msmarco, beir, miracl) |
| dataset_dir | Optional[str] |
None |
Path to local dataset directory or download location. For custom datasets, must contain corpus.jsonl, <split>_queries.jsonl, <split>_qrels.jsonl |
| force_redownload | bool |
False |
Whether to force redownload the dataset from remote |
| dataset_names | Optional[str] |
None |
Names of datasets to evaluate (nargs="+"). If None, all available datasets are evaluated |
| splits | str |
"test" |
Splits to evaluate (nargs="+") |
| corpus_embd_save_dir | str |
None |
Path to save/load corpus embeddings. If None, embeddings are not saved |
| output_dir | str |
"./search_results" |
Path to save search results |
| search_top_k | int |
1000 |
Top-k for retrieval stage |
| rerank_top_k | int |
100 |
Top-k for reranking stage |
| cache_path | str |
None |
Cache directory for loading datasets |
| token | str |
HF_TOKEN env var |
HuggingFace token for model/dataset access |
| overwrite | bool |
False |
Whether to overwrite existing evaluation results |
| ignore_identical_ids | bool |
False |
Whether to ignore identical query/doc IDs in search results |
| k_values | List[int] |
[1, 3, 5, 10, 100, 1000] |
k values for metric computation (nargs="+") |
| eval_output_method | str |
"markdown" |
Output format: "json" or "markdown" |
| eval_output_path | str |
"./eval_results.md" |
Path to save evaluation results |
| eval_metrics | List[str] |
["ndcg_at_10", "recall_at_10"] |
Metrics to evaluate (nargs="+") |
AbsEvalModelArgs
Base dataclass for model arguments during evaluation.
Signature
@dataclass
class AbsEvalModelArgs:
# ================ embedder config ===============
embedder_name_or_path: str # required
embedder_model_class: Optional[str] = None # choices: encoder-only-base, encoder-only-m3, decoder-only-base, decoder-only-icl
normalize_embeddings: bool = True
pooling_method: str = "cls"
use_fp16: bool = True
devices: Optional[str] = None # nargs="+"
query_instruction_for_retrieval: Optional[str] = None
query_instruction_format_for_retrieval: str = "{}{}"
examples_for_task: Optional[str] = None
examples_instruction_format: str = "{}{}"
trust_remote_code: bool = False
# ================ reranker config ===============
reranker_name_or_path: Optional[str] = None
reranker_model_class: Optional[str] = None # choices: encoder-only-base, decoder-only-base, decoder-only-layerwise, decoder-only-lightweight
reranker_peft_path: Optional[str] = None
use_bf16: bool = False
query_instruction_for_rerank: Optional[str] = None
query_instruction_format_for_rerank: str = "{}{}"
passage_instruction_for_rerank: Optional[str] = None
passage_instruction_format_for_rerank: str = "{}{}"
cache_dir: str = None
# ================ for inference ===============
embedder_batch_size: int = 3000
reranker_batch_size: int = 3000
embedder_query_max_length: int = 512
embedder_passage_max_length: int = 512
reranker_query_max_length: Optional[int] = None
reranker_max_length: int = 512
normalize: bool = False
prompt: Optional[str] = None
cutoff_layers: List[int] = None
compress_ratio: int = 1
compress_layers: Optional[int] = None # nargs="+"
Fields
| Field | Type | Default | Description |
|---|---|---|---|
| embedder_name_or_path | str |
required | The embedder model name or path |
| embedder_model_class | Optional[str] |
None |
Model class: encoder-only-base, encoder-only-m3, decoder-only-base, decoder-only-icl |
| normalize_embeddings | bool |
True |
Whether to normalize the embeddings |
| pooling_method | str |
"cls" |
Pooling method for the embedder |
| use_fp16 | bool |
True |
Whether to use FP16 for inference |
| devices | Optional[str] |
None |
Devices for inference (nargs="+") |
| query_instruction_for_retrieval | Optional[str] |
None |
Instruction prepended to queries during retrieval |
| query_instruction_format_for_retrieval | str |
"{}{}" |
Format template for query instruction |
| examples_for_task | Optional[str] |
None |
In-context examples for ICL models |
| examples_instruction_format | str |
"{}{}" |
Format template for examples |
| trust_remote_code | bool |
False |
Whether to trust remote code |
| reranker_name_or_path | Optional[str] |
None |
The reranker model name or path |
| reranker_model_class | Optional[str] |
None |
Model class: encoder-only-base, decoder-only-base, decoder-only-layerwise, decoder-only-lightweight |
| reranker_peft_path | Optional[str] |
None |
Path to PEFT adapter for reranker |
| use_bf16 | bool |
False |
Whether to use BF16 for inference |
| query_instruction_for_rerank | Optional[str] |
None |
Instruction prepended to queries during reranking |
| query_instruction_format_for_rerank | str |
"{}{}" |
Format template for rerank query instruction |
| passage_instruction_for_rerank | Optional[str] |
None |
Instruction prepended to passages during reranking |
| passage_instruction_format_for_rerank | str |
"{}{}" |
Format template for rerank passage instruction |
| cache_dir | str |
None |
Cache directory for models |
| embedder_batch_size | int |
3000 |
Batch size for embedder inference |
| reranker_batch_size | int |
3000 |
Batch size for reranker inference |
| embedder_query_max_length | int |
512 |
Max token length for queries |
| embedder_passage_max_length | int |
512 |
Max token length for passages |
| reranker_query_max_length | Optional[int] |
None |
Max token length for reranker queries |
| reranker_max_length | int |
512 |
Max token length for reranking |
| normalize | bool |
False |
Whether to normalize reranking scores |
| prompt | Optional[str] |
None |
Prompt for the reranker |
| cutoff_layers | List[int] |
None |
Output layers for layerwise/lightweight reranker |
| compress_ratio | int |
1 |
Compress ratio for lightweight reranker |
| compress_layers | Optional[int] |
None |
Compress layers for lightweight reranker (nargs="+") |
Post-initialization
The __post_init__ method replaces escaped newline sequences (\\n) with actual newlines in the instruction format fields: query_instruction_format_for_retrieval, examples_instruction_format, query_instruction_format_for_rerank, and passage_instruction_format_for_rerank.
Input / Output
Input: Command-line arguments parsed via HuggingFace's HfArgumentParser. Both dataclasses are typically parsed together:
from transformers import HfArgumentParser
from FlagEmbedding.abc.evaluation.arguments import AbsEvalArgs, AbsEvalModelArgs
parser = HfArgumentParser((AbsEvalArgs, AbsEvalModelArgs))
eval_args, model_args = parser.parse_args_into_dataclasses()
Output: Configured dataclass instances (AbsEvalArgs and AbsEvalModelArgs) ready to be passed to AbsEvalRunner.
Example
A typical evaluation shell script calling the BEIR benchmark:
#!/bin/bash
python -m FlagEmbedding.evaluation.beir \
--eval_name beir \
--dataset_dir ./data/beir \
--dataset_names arguana fiqa \
--splits test \
--output_dir ./search_results \
--search_top_k 1000 \
--rerank_top_k 100 \
--k_values 1 3 5 10 100 1000 \
--eval_output_method markdown \
--eval_output_path ./eval_results.md \
--eval_metrics ndcg_at_10 recall_at_10 \
--embedder_name_or_path BAAI/bge-base-en-v1.5 \
--embedder_model_class encoder-only-base \
--normalize_embeddings True \
--embedder_batch_size 256 \
--embedder_query_max_length 512 \
--embedder_passage_max_length 512 \
--reranker_name_or_path BAAI/bge-reranker-v2-m3 \
--reranker_model_class encoder-only-base \
--reranker_batch_size 256 \
--reranker_max_length 512 \
--devices cuda:0 cuda:1 \
--use_fp16 True