Implementation:Vibrantlabsai Ragas LLMSQLEquivalence

Knowledge Sources	Vibrantlabsai_Ragas
Domains	Evaluation, Metrics
Last Updated	2026-02-12 00:00 GMT

Overview

LLMSQLEquivalence is a metric that uses an LLM to determine whether a generated SQL query is semantically equivalent to a reference SQL query, given a database schema.

Description

This metric evaluates text-to-SQL systems by leveraging an LLM to compare a generated SQL query against a reference SQL query in the context of a provided database schema. Rather than performing exact string matching, the metric uses semantic understanding of SQL to determine logical equivalence.

The algorithm works as follows:

The reference SQL, generated SQL, and database schema (provided as reference_contexts joined by newlines) are packaged into an EquivalenceInput Pydantic model.
The input is sent to the LLM via an EquivalencePrompt that instructs: "Explain and compare two SQL queries (Q1 and Q2) based on the provided database schema. First, explain each query, then determine if they have significant logical differences."
The LLM returns an EquivalenceOutput containing:
- response_query_explaination - An explanation of the generated SQL query
- reference_query_explaination - An explanation of the reference SQL query
- equivalence - A boolean indicating semantic equivalence
The boolean is cast to an integer: 1 for equivalent, 0 for not equivalent.

The prompt includes a few-shot example demonstrating that active = 1 and active = true are semantically equivalent on a BOOLEAN column.

The output type is BINARY, reflecting the yes/no nature of equivalence.

Usage

Use this metric when evaluating text-to-SQL systems where the generated SQL may differ syntactically from the reference but produce the same results. It handles cases such as different boolean representations, equivalent JOIN orderings, aliasing differences, and other syntactic variations that do not affect query semantics. The database schema must be provided in the reference_contexts field.

Code Reference

Source Location

Repository: Vibrantlabsai_Ragas
File: src/ragas/metrics/_sql_semantic_equivalence.py

Signature

@dataclass
class LLMSQLEquivalence(MetricWithLLM, SingleTurnMetric):
    name: str = "llm_sql_equivalence_with_reference"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {"response", "reference", "reference_contexts"}
        }
    )
    output_type: t.Optional[MetricOutputType] = MetricOutputType.BINARY
    equivalence_prompt: PydanticPrompt = EquivalencePrompt()

Import

from ragas.metrics import LLMSQLEquivalence

I/O Contract

Inputs

Name	Type	Required	Description
response	str	Yes	The generated SQL query to evaluate
reference	str	Yes	The reference (ground truth) SQL query
reference_contexts	list[str]	Yes	The database schema description; multiple entries are joined with newlines to form the full schema

Outputs

Name	Type	Description
score	int	1 if the generated SQL is semantically equivalent to the reference SQL, 0 otherwise

Usage Examples

Basic Usage

from ragas.metrics import LLMSQLEquivalence
from ragas.dataset_schema import SingleTurnSample

metric = LLMSQLEquivalence()
# metric.llm = your_llm

sample = SingleTurnSample(
    response="SELECT id, name FROM users WHERE active = true;",
    reference="SELECT id, name FROM users WHERE active = 1;",
    reference_contexts=[
        "Table users:",
        "- id: INT",
        "- name: VARCHAR",
        "- active: BOOLEAN",
    ],
)

# score = await metric.single_turn_ascore(sample)
# Returns 1 (equivalent) or 0 (not equivalent)

Evaluating Complex Queries

from ragas.metrics import LLMSQLEquivalence
from ragas.dataset_schema import SingleTurnSample

metric = LLMSQLEquivalence()
# metric.llm = your_llm

sample = SingleTurnSample(
    response="SELECT u.name, COUNT(o.id) FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name;",
    reference="SELECT users.name, COUNT(orders.id) FROM users INNER JOIN orders ON users.id = orders.user_id GROUP BY users.name;",
    reference_contexts=[
        "Table users: id INT PRIMARY KEY, name VARCHAR(255)",
        "Table orders: id INT PRIMARY KEY, user_id INT REFERENCES users(id), amount DECIMAL",
    ],
)

# score = await metric.single_turn_ascore(sample)
# Returns 1 since both queries are logically equivalent (alias vs full name)

Related Pages

Environment:Vibrantlabsai_Ragas_Python_3_9_Core_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment