Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:FMInference FlexLLMGen Compute Metrics

From Leeroopedia


Metadata

Field Value
Sources FlexLLMGen|https://github.com/FMInference/FlexLLMGen
Domains Evaluation, Metrics
Last updated 2026-02-09 00:00 GMT

Overview

Concrete tool for computing classification metrics on data wrangling task predictions provided by the FlexLLMGen data wrangling application.

Description

compute_metrics() takes prediction and gold label lists plus task name. It iterates over pairs, normalizes to lowercase, applies task-specific matching (exact for entity_matching/data_imputation, startswith for schema_matching/error_detection_spelling, endswith for error_detection), counts TP/TN/FP/FN, and returns (precision, recall, accuracy, f1) as a tuple.

Usage

Call after model generation to evaluate predictions against ground truth labels.

Code Reference

  • Source: flexllmgen/apps/data_wrangle/utils/utils.py, Lines: 25-63
  • Signature:
def compute_metrics(preds: List, golds: List, task: str):
    """Compute metrics.

    Args:
        preds: List of predicted label strings
        golds: List of ground truth label strings
        task: Task name - one of "entity_matching", "data_imputation",
              "error_detection", "error_detection_spelling", "schema_matching"
    Returns:
        Tuple of (precision: float, recall: float, accuracy: float, f1: float)
    """
  • Import:
from flexllmgen.apps.data_wrangle.utils.utils import compute_metrics

I/O Contract

Inputs

Name Type Required Description
preds List[str] Yes Predicted label strings
golds List[str] Yes Ground truth label strings
task str Yes Task name for matching strategy

Outputs

Tuple[float, float, float, float] — (precision, recall, accuracy, f1).

Usage Examples

from flexllmgen.apps.data_wrangle.utils.utils import compute_metrics

predictions = ["yes", "no", "yes", "no", "yes"]
ground_truth = ["yes", "no", "no", "no", "yes"]

prec, rec, acc, f1 = compute_metrics(predictions, ground_truth, task="entity_matching")
print(f"Precision: {prec:.3f}, Recall: {rec:.3f}, Accuracy: {acc:.3f}, F1: {f1:.3f}")
# Precision: 0.667, Recall: 1.000, Accuracy: 0.800, F1: 0.800

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment