Implementation:Vibrantlabsai Ragas DataCompyScore

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

DataCompyScore evaluates the similarity between two CSV-formatted tabular datasets by comparing rows or columns using the datacompy library, reporting precision, recall, or F1.

Description

This metric is designed for evaluating LLM outputs that produce structured tabular data (CSV format). It compares a response CSV against a reference CSV using the datacompy library's Compare functionality.

The metric operates in two configurable modes:

Row mode (default): Counts the number of matching rows between the two dataframes using index-based comparison. Precision is computed as matching rows divided by the total rows in the response, and recall as matching rows divided by total rows in the reference.

Column mode: Identifies columns where all values match (zero unequal entries in column statistics). Precision is computed as matched columns divided by total columns in the response, and recall as matched columns divided by total columns in the reference.

The final output depends on the configured metric parameter:

precision: How many items in the response are correct
recall: How many items in the reference are captured by the response
f1: The harmonic mean of precision and recall, computed as 2 * (precision * recall) / (precision + recall)

Both the reference and response fields are expected to be CSV-formatted strings that can be parsed by pandas. If parsing fails, the metric returns NaN.

Usage

Use this metric when evaluating LLM-generated tabular outputs (such as data extraction, table generation, or structured data transformation tasks) where both the expected and actual outputs are CSV-formatted strings. Requires the datacompy and pandas packages to be installed.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/_datacompy_score.py

Signature

@dataclass
class DataCompyScore(SingleTurnMetric):
    name: str = "data_compare_score"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {MetricType.SINGLE_TURN: {"reference", "response"}}
    )
    mode: t.Literal["rows", "columns"] = "rows"
    metric: t.Literal["precision", "recall", "f1"] = "f1"

Import

from ragas.metrics import DataCompyScore

I/O Contract

Inputs

Name	Type	Required	Description
reference	str	Yes	A CSV-formatted string representing the expected (ground truth) tabular data
response	str	Yes	A CSV-formatted string representing the LLM-generated tabular data

Configuration

Name	Type	Default	Description
mode	Literal["rows", "columns"]	"rows"	Whether to compare by matching rows or matching columns
metric	Literal["precision", "recall", "f1"]	"f1"	Which metric to report: precision, recall, or the F1 harmonic mean

Outputs

Name	Type	Description
score	float	The computed precision, recall, or F1 score. Returns NaN if the CSV strings cannot be parsed.

Key Components

Dependencies

The metric requires two external packages that are imported at runtime in __post_init__:

pandas: Used to parse CSV strings via pd.read_csv(StringIO(...))
datacompy: Used for the Compare class that performs index-based dataframe comparison

If either package is missing, an ImportError is raised with installation instructions.

Validation

The __post_init__ method validates that:

The mode parameter is either "rows" or "columns"
The metric parameter is either "precision", "recall", or "f1"

Invalid values raise a ValueError.

Usage Examples

Basic Usage

from ragas.metrics import DataCompyScore
from ragas.dataset_schema import SingleTurnSample

# Compare rows using F1 score (default)
metric = DataCompyScore(mode="rows", metric="f1")

sample = SingleTurnSample(
    reference="name,age\nAlice,30\nBob,25\nCharlie,35",
    response="name,age\nAlice,30\nBob,25\nDave,40"
)

# score = await metric.single_turn_ascore(sample)
# Two of three rows match in each direction

Column-Level Comparison

from ragas.metrics import DataCompyScore
from ragas.dataset_schema import SingleTurnSample

metric = DataCompyScore(mode="columns", metric="recall")

sample = SingleTurnSample(
    reference="name,age,city\nAlice,30,NYC\nBob,25,LA",
    response="name,age,city\nAlice,30,NYC\nBob,25,SF"
)

# score = await metric.single_turn_ascore(sample)
# "name" and "age" columns match; "city" does not
# recall = 2/3 = 0.667

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment