Implementation:Vibrantlabsai Ragas LLMSQLEquivalence
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Metrics |
| Last Updated | 2026-02-12 00:00 GMT |
Overview
LLMSQLEquivalence is a metric that uses an LLM to determine whether a generated SQL query is semantically equivalent to a reference SQL query, given a database schema.
Description
This metric evaluates text-to-SQL systems by leveraging an LLM to compare a generated SQL query against a reference SQL query in the context of a provided database schema. Rather than performing exact string matching, the metric uses semantic understanding of SQL to determine logical equivalence.
The algorithm works as follows:
- The reference SQL, generated SQL, and database schema (provided as
reference_contextsjoined by newlines) are packaged into an EquivalenceInput Pydantic model. - The input is sent to the LLM via an EquivalencePrompt that instructs: "Explain and compare two SQL queries (Q1 and Q2) based on the provided database schema. First, explain each query, then determine if they have significant logical differences."
- The LLM returns an EquivalenceOutput containing:
response_query_explaination- An explanation of the generated SQL queryreference_query_explaination- An explanation of the reference SQL queryequivalence- A boolean indicating semantic equivalence
- The boolean is cast to an integer: 1 for equivalent, 0 for not equivalent.
The prompt includes a few-shot example demonstrating that active = 1 and active = true are semantically equivalent on a BOOLEAN column.
The output type is BINARY, reflecting the yes/no nature of equivalence.
Usage
Use this metric when evaluating text-to-SQL systems where the generated SQL may differ syntactically from the reference but produce the same results. It handles cases such as different boolean representations, equivalent JOIN orderings, aliasing differences, and other syntactic variations that do not affect query semantics. The database schema must be provided in the reference_contexts field.
Code Reference
Source Location
- Repository: Vibrantlabsai_Ragas
- File: src/ragas/metrics/_sql_semantic_equivalence.py
Signature
@dataclass
class LLMSQLEquivalence(MetricWithLLM, SingleTurnMetric):
name: str = "llm_sql_equivalence_with_reference"
_required_columns: t.Dict[MetricType, t.Set[str]] = field(
default_factory=lambda: {
MetricType.SINGLE_TURN: {"response", "reference", "reference_contexts"}
}
)
output_type: t.Optional[MetricOutputType] = MetricOutputType.BINARY
equivalence_prompt: PydanticPrompt = EquivalencePrompt()
Import
from ragas.metrics import LLMSQLEquivalence
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| response | str | Yes | The generated SQL query to evaluate |
| reference | str | Yes | The reference (ground truth) SQL query |
| reference_contexts | list[str] | Yes | The database schema description; multiple entries are joined with newlines to form the full schema |
Outputs
| Name | Type | Description |
|---|---|---|
| score | int | 1 if the generated SQL is semantically equivalent to the reference SQL, 0 otherwise |
Usage Examples
Basic Usage
from ragas.metrics import LLMSQLEquivalence
from ragas.dataset_schema import SingleTurnSample
metric = LLMSQLEquivalence()
# metric.llm = your_llm
sample = SingleTurnSample(
response="SELECT id, name FROM users WHERE active = true;",
reference="SELECT id, name FROM users WHERE active = 1;",
reference_contexts=[
"Table users:",
"- id: INT",
"- name: VARCHAR",
"- active: BOOLEAN",
],
)
# score = await metric.single_turn_ascore(sample)
# Returns 1 (equivalent) or 0 (not equivalent)
Evaluating Complex Queries
from ragas.metrics import LLMSQLEquivalence
from ragas.dataset_schema import SingleTurnSample
metric = LLMSQLEquivalence()
# metric.llm = your_llm
sample = SingleTurnSample(
response="SELECT u.name, COUNT(o.id) FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name;",
reference="SELECT users.name, COUNT(orders.id) FROM users INNER JOIN orders ON users.id = orders.user_id GROUP BY users.name;",
reference_contexts=[
"Table users: id INT PRIMARY KEY, name VARCHAR(255)",
"Table orders: id INT PRIMARY KEY, user_id INT REFERENCES users(id), amount DECIMAL",
],
)
# score = await metric.single_turn_ascore(sample)
# Returns 1 since both queries are logically equivalent (alias vs full name)