Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FlagOpen FlagEmbedding LLM Embedder Retrieval Metrics

From Leeroopedia
Revision as of 14:59, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/FlagOpen_FlagEmbedding_LLM_Embedder_Retrieval_Metrics.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Machine Learning, Information Retrieval, Evaluation Metrics
Last Updated 2026-02-09 00:00 GMT

Overview

Comprehensive metric computation and result processing utilities for evaluating retrieval model performance.

Description

The RetrievalMetric class provides a unified interface for computing standard information retrieval metrics including MRR (Mean Reciprocal Rank), Recall, NDCG (Normalized Discounted Cumulative Gain), and specialized metrics like NQ (Natural Questions) evaluation. It also includes utility functions for collating retrieval results into training data (negative sampling) or evaluation formats (key collation), and for managing teacher scores in knowledge distillation workflows.

Key features include:

  • Multiple cutoff values support (@10, @100, etc.)
  • Result saving and loading functionality
  • Corpus-based result collation for downstream processing
  • Teacher score integration for distillation
  • Answer filtering for hard negative mining

Usage

Use this module to evaluate retrieval model performance, generate training data from retrieval results, or prepare datasets for reranking models with teacher scores.

Code Reference

Source Location

Signature

class RetrievalMetric:
    @classmethod
    def get_metric_fn(cls, metric_names, **kwds)

    @staticmethod
    def mrr(eval_data=None, cutoffs=[10], **kwds)

    @staticmethod
    def recall(eval_data=None, cutoffs=[10], **kwds)

    @staticmethod
    def ndcg(eval_data=None, cutoffs=[10], **kwds)

    @staticmethod
    def nq(eval_data, corpus, cache_dir=None, **kwds)

    @staticmethod
    def collate_key(eval_data, save_name, corpus, output_dir=None,
                    save_to_output=False, **kwds)

    @staticmethod
    def collate_neg(eval_data, save_name, corpus, max_neg_num=100,
                    filter_answers=False, output_dir=None,
                    save_to_output=False, **kwds)

    @staticmethod
    def collate_score(eval_data, save_name, output_dir=None,
                     save_to_output=False, **kwds)

Import

from retrieval.metrics import RetrievalMetric

I/O Contract

Inputs

Name Type Required Description
query_ids List[int] Yes Query identifiers for each prediction
preds List[List[int]] Yes Predicted document indices for each query
labels Dict No Ground truth positive document indices
scores List[List[float]] No Retrieval scores for each prediction
eval_data str No Path to evaluation data file
corpus Dataset No Corpus dataset for collating text
cutoffs List[int] No Cutoff values for metric computation (default: [10])

Outputs

Name Type Description
metrics Dict[str, float] Dictionary of computed metrics (e.g., "recall@10": 0.85)
result_file str Path to saved result file (for collation functions)

Usage Examples

from retrieval.metrics import RetrievalMetric

# Compute multiple metrics
compute_metrics = RetrievalMetric.get_metric_fn(
    metric_names=["recall", "mrr", "ndcg"],
    eval_data="eval_data.json",
    cutoffs=[1, 5, 10, 100]
)

# Evaluate predictions
metrics = compute_metrics(
    query_ids=[0, 1, 2],
    preds=[[10, 23, 45], [3, 8, 12], [99, 1, 56]]
)
print(metrics)
# Output: {"recall@10": 0.85, "mrr@10": 0.67, "ndcg@10": 0.72, ...}

# Collate retrieval results for training (negative mining)
collate_negatives = RetrievalMetric.collate_neg(
    eval_data="train_queries.json",
    save_name="mined_negatives",
    corpus=corpus_dataset,
    max_neg_num=100,
    filter_answers=True,
    output_dir="output/"
)

# Generate negatives from retrieval results
collate_negatives(
    query_ids=query_ids,
    preds=retrieval_results
)
# Saves to: output/train_queries.neg.mined_negatives.json

# Collate keys for reranking evaluation
collate_keys = RetrievalMetric.collate_key(
    eval_data="eval_queries.json",
    save_name="retrieved_docs",
    corpus=corpus_dataset,
    output_dir="output/"
)

collate_keys(query_ids=query_ids, preds=top_k_results)
# Saves to: output/eval_queries.key.retrieved_docs.json

# Add teacher scores for distillation
collate_scores = RetrievalMetric.collate_score(
    eval_data="train_with_candidates.json",
    save_name="teacher_scored",
    output_dir="output/"
)

collate_scores(
    query_ids=query_ids,
    preds=candidate_indices,
    scores=teacher_scores
)
# Saves to: output/train_with_candidates.scored.teacher_scored.json

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment