Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:ChenghaoMou Text dedup Evaluate Predictions

From Leeroopedia
Knowledge Sources
Domains Evaluation, Deduplication
Last Updated 2026-02-14 21:00 GMT

Overview

Concrete tool for evaluating deduplication predictions against ground truth using pairwise metrics and adjusted Rand index provided by text-dedup benchmarks.

Description

The evaluate_predictions function in benchmark_core.py computes pairwise precision, recall, macro F1, and accuracy for the CORE dataset by classifying each document's prediction via classify_prediction. Helper functions clusters_to_predictions_minhash and clusters_to_predictions_simhash convert algorithm-specific cluster mappings to the prediction format.

For NEWS-COPY, evaluate_clustering in benchmark_news.py uses sklearn.metrics.adjusted_rand_score to compare predicted cluster labels against ground truth.

Usage

Import these functions when running benchmark evaluation on CORE or NEWS-COPY datasets.

Code Reference

Source Location

  • Repository: text-dedup
  • File: benchmarks/benchmark_core.py (L52-115), benchmarks/utils.py (L66-171), benchmarks/benchmark_news.py (L40-57)

Signature

def evaluate_predictions(
    labels: dict[str, set[str]],
    predictions: dict[str, set[str]],
) -> dict:
    """Evaluate predictions against ground truth labels.

    Parameters
    ----------
    labels : dict[str, set[str]]
        Ground truth: mapping from ID to set of duplicate IDs.
    predictions : dict[str, set[str]]
        Predictions: mapping from ID to set of duplicate IDs.

    Returns
    -------
    dict
        Metrics: precision_duplicates, recall_duplicates,
        precision_non_duplicates, recall_non_duplicates,
        macro_f1, accuracy, class_distribution.
    """

def classify_prediction(duplicates: set, predictions: set) -> str:
    """Classify a prediction as TP, FP, TN, or FN."""

def clusters_to_predictions_minhash(
    cluster_mapping: dict[int, int],
    id_to_core_id: dict[int, str],
) -> dict[str, set[str]]:
    """Convert MinHash cluster mapping to pairwise predictions."""

def clusters_to_predictions_simhash(
    cluster_mapping: dict[int, int],
    id_to_core_id: dict[int, str],
) -> dict[str, set[str]]:
    """Convert SimHash cluster mapping to pairwise predictions."""

Import

from benchmarks.benchmark_core import evaluate_predictions, prepare_ground_truth
from benchmarks.utils import classify_prediction, clusters_to_predictions_minhash

I/O Contract

Inputs

Name Type Required Description
labels dict[str, set[str]] Yes Ground truth duplicate mappings
predictions dict[str, set[str]] Yes Algorithm predictions

Outputs

Name Type Description
metrics dict precision_duplicates, recall_duplicates, macro_f1, accuracy, etc.

Usage Examples

Evaluating CORE Benchmark

from benchmarks.benchmark_core import evaluate_predictions, prepare_ground_truth
from benchmarks.utils import clusters_to_predictions_minhash
import pickle

# Load ground truth
id_to_core_id, labels = prepare_ground_truth(dataset)

# Load cluster results from deduplication
with open("output/clusters.pickle", "rb") as f:
    cluster_mapping = pickle.load(f)

# Convert to predictions and evaluate
predictions = clusters_to_predictions_minhash(cluster_mapping, id_to_core_id)
metrics = evaluate_predictions(labels, predictions)

print(f"Precision (Duplicates): {metrics['precision_duplicates']:.4f}")
print(f"Recall (Duplicates): {metrics['recall_duplicates']:.4f}")
print(f"Macro F1: {metrics['macro_f1']:.4f}")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment