Implementation:Vibrantlabsai Ragas DataCompyScore
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
DataCompyScore evaluates the similarity between two CSV-formatted tabular datasets by comparing rows or columns using the datacompy library, reporting precision, recall, or F1.
Description
This metric is designed for evaluating LLM outputs that produce structured tabular data (CSV format). It compares a response CSV against a reference CSV using the datacompy library's Compare functionality.
The metric operates in two configurable modes:
Row mode (default): Counts the number of matching rows between the two dataframes using index-based comparison. Precision is computed as matching rows divided by the total rows in the response, and recall as matching rows divided by total rows in the reference.
Column mode: Identifies columns where all values match (zero unequal entries in column statistics). Precision is computed as matched columns divided by total columns in the response, and recall as matched columns divided by total columns in the reference.
The final output depends on the configured metric parameter:
- precision: How many items in the response are correct
- recall: How many items in the reference are captured by the response
- f1: The harmonic mean of precision and recall, computed as 2 * (precision * recall) / (precision + recall)
Both the reference and response fields are expected to be CSV-formatted strings that can be parsed by pandas. If parsing fails, the metric returns NaN.
Usage
Use this metric when evaluating LLM-generated tabular outputs (such as data extraction, table generation, or structured data transformation tasks) where both the expected and actual outputs are CSV-formatted strings. Requires the datacompy and pandas packages to be installed.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/_datacompy_score.py
Signature
@dataclass
class DataCompyScore(SingleTurnMetric):
name: str = "data_compare_score"
_required_columns: t.Dict[MetricType, t.Set[str]] = field(
default_factory=lambda: {MetricType.SINGLE_TURN: {"reference", "response"}}
)
mode: t.Literal["rows", "columns"] = "rows"
metric: t.Literal["precision", "recall", "f1"] = "f1"
Import
from ragas.metrics import DataCompyScore
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| reference | str | Yes | A CSV-formatted string representing the expected (ground truth) tabular data |
| response | str | Yes | A CSV-formatted string representing the LLM-generated tabular data |
Configuration
| Name | Type | Default | Description |
|---|---|---|---|
| mode | Literal["rows", "columns"] | "rows" | Whether to compare by matching rows or matching columns |
| metric | Literal["precision", "recall", "f1"] | "f1" | Which metric to report: precision, recall, or the F1 harmonic mean |
Outputs
| Name | Type | Description |
|---|---|---|
| score | float | The computed precision, recall, or F1 score. Returns NaN if the CSV strings cannot be parsed. |
Key Components
Dependencies
The metric requires two external packages that are imported at runtime in __post_init__:
- pandas: Used to parse CSV strings via
pd.read_csv(StringIO(...)) - datacompy: Used for the
Compareclass that performs index-based dataframe comparison
If either package is missing, an ImportError is raised with installation instructions.
Validation
The __post_init__ method validates that:
- The mode parameter is either "rows" or "columns"
- The metric parameter is either "precision", "recall", or "f1"
Invalid values raise a ValueError.
Usage Examples
Basic Usage
from ragas.metrics import DataCompyScore
from ragas.dataset_schema import SingleTurnSample
# Compare rows using F1 score (default)
metric = DataCompyScore(mode="rows", metric="f1")
sample = SingleTurnSample(
reference="name,age\nAlice,30\nBob,25\nCharlie,35",
response="name,age\nAlice,30\nBob,25\nDave,40"
)
# score = await metric.single_turn_ascore(sample)
# Two of three rows match in each direction
Column-Level Comparison
from ragas.metrics import DataCompyScore
from ragas.dataset_schema import SingleTurnSample
metric = DataCompyScore(mode="columns", metric="recall")
sample = SingleTurnSample(
reference="name,age,city\nAlice,30,NYC\nBob,25,LA",
response="name,age,city\nAlice,30,NYC\nBob,25,SF"
)
# score = await metric.single_turn_ascore(sample)
# "name" and "age" columns match; "city" does not
# recall = 2/3 = 0.667