Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas DataCompyScore

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

DataCompyScore evaluates the similarity between two CSV-formatted tabular datasets by comparing rows or columns using the datacompy library, reporting precision, recall, or F1.

Description

This metric is designed for evaluating LLM outputs that produce structured tabular data (CSV format). It compares a response CSV against a reference CSV using the datacompy library's Compare functionality.

The metric operates in two configurable modes:

Row mode (default): Counts the number of matching rows between the two dataframes using index-based comparison. Precision is computed as matching rows divided by the total rows in the response, and recall as matching rows divided by total rows in the reference.

Column mode: Identifies columns where all values match (zero unequal entries in column statistics). Precision is computed as matched columns divided by total columns in the response, and recall as matched columns divided by total columns in the reference.

The final output depends on the configured metric parameter:

  • precision: How many items in the response are correct
  • recall: How many items in the reference are captured by the response
  • f1: The harmonic mean of precision and recall, computed as 2 * (precision * recall) / (precision + recall)

Both the reference and response fields are expected to be CSV-formatted strings that can be parsed by pandas. If parsing fails, the metric returns NaN.

Usage

Use this metric when evaluating LLM-generated tabular outputs (such as data extraction, table generation, or structured data transformation tasks) where both the expected and actual outputs are CSV-formatted strings. Requires the datacompy and pandas packages to be installed.

Code Reference

Source Location

Signature

@dataclass
class DataCompyScore(SingleTurnMetric):
    name: str = "data_compare_score"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {MetricType.SINGLE_TURN: {"reference", "response"}}
    )
    mode: t.Literal["rows", "columns"] = "rows"
    metric: t.Literal["precision", "recall", "f1"] = "f1"

Import

from ragas.metrics import DataCompyScore

I/O Contract

Inputs

Name Type Required Description
reference str Yes A CSV-formatted string representing the expected (ground truth) tabular data
response str Yes A CSV-formatted string representing the LLM-generated tabular data

Configuration

Name Type Default Description
mode Literal["rows", "columns"] "rows" Whether to compare by matching rows or matching columns
metric Literal["precision", "recall", "f1"] "f1" Which metric to report: precision, recall, or the F1 harmonic mean

Outputs

Name Type Description
score float The computed precision, recall, or F1 score. Returns NaN if the CSV strings cannot be parsed.

Key Components

Dependencies

The metric requires two external packages that are imported at runtime in __post_init__:

  • pandas: Used to parse CSV strings via pd.read_csv(StringIO(...))
  • datacompy: Used for the Compare class that performs index-based dataframe comparison

If either package is missing, an ImportError is raised with installation instructions.

Validation

The __post_init__ method validates that:

  • The mode parameter is either "rows" or "columns"
  • The metric parameter is either "precision", "recall", or "f1"

Invalid values raise a ValueError.

Usage Examples

Basic Usage

from ragas.metrics import DataCompyScore
from ragas.dataset_schema import SingleTurnSample

# Compare rows using F1 score (default)
metric = DataCompyScore(mode="rows", metric="f1")

sample = SingleTurnSample(
    reference="name,age\nAlice,30\nBob,25\nCharlie,35",
    response="name,age\nAlice,30\nBob,25\nDave,40"
)

# score = await metric.single_turn_ascore(sample)
# Two of three rows match in each direction

Column-Level Comparison

from ragas.metrics import DataCompyScore
from ragas.dataset_schema import SingleTurnSample

metric = DataCompyScore(mode="columns", metric="recall")

sample = SingleTurnSample(
    reference="name,age,city\nAlice,30,NYC\nBob,25,LA",
    response="name,age,city\nAlice,30,NYC\nBob,25,SF"
)

# score = await metric.single_turn_ascore(sample)
# "name" and "age" columns match; "city" does not
# recall = 2/3 = 0.667

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment