Implementation:ChenghaoMou Text dedup Evaluate Predictions
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Deduplication |
| Last Updated | 2026-02-14 21:00 GMT |
Overview
Concrete tool for evaluating deduplication predictions against ground truth using pairwise metrics and adjusted Rand index provided by text-dedup benchmarks.
Description
The evaluate_predictions function in benchmark_core.py computes pairwise precision, recall, macro F1, and accuracy for the CORE dataset by classifying each document's prediction via classify_prediction. Helper functions clusters_to_predictions_minhash and clusters_to_predictions_simhash convert algorithm-specific cluster mappings to the prediction format.
For NEWS-COPY, evaluate_clustering in benchmark_news.py uses sklearn.metrics.adjusted_rand_score to compare predicted cluster labels against ground truth.
Usage
Import these functions when running benchmark evaluation on CORE or NEWS-COPY datasets.
Code Reference
Source Location
- Repository: text-dedup
- File: benchmarks/benchmark_core.py (L52-115), benchmarks/utils.py (L66-171), benchmarks/benchmark_news.py (L40-57)
Signature
def evaluate_predictions(
labels: dict[str, set[str]],
predictions: dict[str, set[str]],
) -> dict:
"""Evaluate predictions against ground truth labels.
Parameters
----------
labels : dict[str, set[str]]
Ground truth: mapping from ID to set of duplicate IDs.
predictions : dict[str, set[str]]
Predictions: mapping from ID to set of duplicate IDs.
Returns
-------
dict
Metrics: precision_duplicates, recall_duplicates,
precision_non_duplicates, recall_non_duplicates,
macro_f1, accuracy, class_distribution.
"""
def classify_prediction(duplicates: set, predictions: set) -> str:
"""Classify a prediction as TP, FP, TN, or FN."""
def clusters_to_predictions_minhash(
cluster_mapping: dict[int, int],
id_to_core_id: dict[int, str],
) -> dict[str, set[str]]:
"""Convert MinHash cluster mapping to pairwise predictions."""
def clusters_to_predictions_simhash(
cluster_mapping: dict[int, int],
id_to_core_id: dict[int, str],
) -> dict[str, set[str]]:
"""Convert SimHash cluster mapping to pairwise predictions."""
Import
from benchmarks.benchmark_core import evaluate_predictions, prepare_ground_truth
from benchmarks.utils import classify_prediction, clusters_to_predictions_minhash
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| labels | dict[str, set[str]] | Yes | Ground truth duplicate mappings |
| predictions | dict[str, set[str]] | Yes | Algorithm predictions |
Outputs
| Name | Type | Description |
|---|---|---|
| metrics | dict | precision_duplicates, recall_duplicates, macro_f1, accuracy, etc. |
Usage Examples
Evaluating CORE Benchmark
from benchmarks.benchmark_core import evaluate_predictions, prepare_ground_truth
from benchmarks.utils import clusters_to_predictions_minhash
import pickle
# Load ground truth
id_to_core_id, labels = prepare_ground_truth(dataset)
# Load cluster results from deduplication
with open("output/clusters.pickle", "rb") as f:
cluster_mapping = pickle.load(f)
# Convert to predictions and evaluate
predictions = clusters_to_predictions_minhash(cluster_mapping, id_to_core_id)
metrics = evaluate_predictions(labels, predictions)
print(f"Precision (Duplicates): {metrics['precision_duplicates']:.4f}")
print(f"Recall (Duplicates): {metrics['recall_duplicates']:.4f}")
print(f"Macro F1: {metrics['macro_f1']:.4f}")