Implementation:FlagOpen FlagEmbedding LLM Embedder Retrieval Metrics
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Information Retrieval, Evaluation Metrics |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Comprehensive metric computation and result processing utilities for evaluating retrieval model performance.
Description
The RetrievalMetric class provides a unified interface for computing standard information retrieval metrics including MRR (Mean Reciprocal Rank), Recall, NDCG (Normalized Discounted Cumulative Gain), and specialized metrics like NQ (Natural Questions) evaluation. It also includes utility functions for collating retrieval results into training data (negative sampling) or evaluation formats (key collation), and for managing teacher scores in knowledge distillation workflows.
Key features include:
- Multiple cutoff values support (@10, @100, etc.)
- Result saving and loading functionality
- Corpus-based result collation for downstream processing
- Teacher score integration for distillation
- Answer filtering for hard negative mining
Usage
Use this module to evaluate retrieval model performance, generate training data from retrieval results, or prepare datasets for reranking models with teacher scores.
Code Reference
Source Location
- Repository: FlagOpen_FlagEmbedding
- File: research/llm_embedder/src/retrieval/metrics.py
Signature
class RetrievalMetric:
@classmethod
def get_metric_fn(cls, metric_names, **kwds)
@staticmethod
def mrr(eval_data=None, cutoffs=[10], **kwds)
@staticmethod
def recall(eval_data=None, cutoffs=[10], **kwds)
@staticmethod
def ndcg(eval_data=None, cutoffs=[10], **kwds)
@staticmethod
def nq(eval_data, corpus, cache_dir=None, **kwds)
@staticmethod
def collate_key(eval_data, save_name, corpus, output_dir=None,
save_to_output=False, **kwds)
@staticmethod
def collate_neg(eval_data, save_name, corpus, max_neg_num=100,
filter_answers=False, output_dir=None,
save_to_output=False, **kwds)
@staticmethod
def collate_score(eval_data, save_name, output_dir=None,
save_to_output=False, **kwds)
Import
from retrieval.metrics import RetrievalMetric
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| query_ids | List[int] | Yes | Query identifiers for each prediction |
| preds | List[List[int]] | Yes | Predicted document indices for each query |
| labels | Dict | No | Ground truth positive document indices |
| scores | List[List[float]] | No | Retrieval scores for each prediction |
| eval_data | str | No | Path to evaluation data file |
| corpus | Dataset | No | Corpus dataset for collating text |
| cutoffs | List[int] | No | Cutoff values for metric computation (default: [10]) |
Outputs
| Name | Type | Description |
|---|---|---|
| metrics | Dict[str, float] | Dictionary of computed metrics (e.g., "recall@10": 0.85) |
| result_file | str | Path to saved result file (for collation functions) |
Usage Examples
from retrieval.metrics import RetrievalMetric
# Compute multiple metrics
compute_metrics = RetrievalMetric.get_metric_fn(
metric_names=["recall", "mrr", "ndcg"],
eval_data="eval_data.json",
cutoffs=[1, 5, 10, 100]
)
# Evaluate predictions
metrics = compute_metrics(
query_ids=[0, 1, 2],
preds=[[10, 23, 45], [3, 8, 12], [99, 1, 56]]
)
print(metrics)
# Output: {"recall@10": 0.85, "mrr@10": 0.67, "ndcg@10": 0.72, ...}
# Collate retrieval results for training (negative mining)
collate_negatives = RetrievalMetric.collate_neg(
eval_data="train_queries.json",
save_name="mined_negatives",
corpus=corpus_dataset,
max_neg_num=100,
filter_answers=True,
output_dir="output/"
)
# Generate negatives from retrieval results
collate_negatives(
query_ids=query_ids,
preds=retrieval_results
)
# Saves to: output/train_queries.neg.mined_negatives.json
# Collate keys for reranking evaluation
collate_keys = RetrievalMetric.collate_key(
eval_data="eval_queries.json",
save_name="retrieved_docs",
corpus=corpus_dataset,
output_dir="output/"
)
collate_keys(query_ids=query_ids, preds=top_k_results)
# Saves to: output/eval_queries.key.retrieved_docs.json
# Add teacher scores for distillation
collate_scores = RetrievalMetric.collate_score(
eval_data="train_with_candidates.json",
save_name="teacher_scored",
output_dir="output/"
)
collate_scores(
query_ids=query_ids,
preds=candidate_indices,
scores=teacher_scores
)
# Saves to: output/train_with_candidates.scored.teacher_scored.json