Implementation:FMInference FlexLLMGen Compute Metrics
Metadata
| Field | Value |
|---|---|
| Sources | FlexLLMGen|https://github.com/FMInference/FlexLLMGen |
| Domains | Evaluation, Metrics |
| Last updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for computing classification metrics on data wrangling task predictions provided by the FlexLLMGen data wrangling application.
Description
compute_metrics() takes prediction and gold label lists plus task name. It iterates over pairs, normalizes to lowercase, applies task-specific matching (exact for entity_matching/data_imputation, startswith for schema_matching/error_detection_spelling, endswith for error_detection), counts TP/TN/FP/FN, and returns (precision, recall, accuracy, f1) as a tuple.
Usage
Call after model generation to evaluate predictions against ground truth labels.
Code Reference
- Source: flexllmgen/apps/data_wrangle/utils/utils.py, Lines: 25-63
- Signature:
def compute_metrics(preds: List, golds: List, task: str):
"""Compute metrics.
Args:
preds: List of predicted label strings
golds: List of ground truth label strings
task: Task name - one of "entity_matching", "data_imputation",
"error_detection", "error_detection_spelling", "schema_matching"
Returns:
Tuple of (precision: float, recall: float, accuracy: float, f1: float)
"""
- Import:
from flexllmgen.apps.data_wrangle.utils.utils import compute_metrics
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| preds | List[str] | Yes | Predicted label strings |
| golds | List[str] | Yes | Ground truth label strings |
| task | str | Yes | Task name for matching strategy |
Outputs
Tuple[float, float, float, float] — (precision, recall, accuracy, f1).
Usage Examples
from flexllmgen.apps.data_wrangle.utils.utils import compute_metrics
predictions = ["yes", "no", "yes", "no", "yes"]
ground_truth = ["yes", "no", "no", "no", "yes"]
prec, rec, acc, f1 = compute_metrics(predictions, ground_truth, task="entity_matching")
print(f"Precision: {prec:.3f}, Recall: {rec:.3f}, Accuracy: {acc:.3f}, F1: {f1:.3f}")
# Precision: 0.667, Recall: 1.000, Accuracy: 0.800, F1: 0.800