Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Vibrantlabsai Ragas LLMSQLEquivalence

From Leeroopedia
Knowledge Sources
Domains Evaluation, Metrics
Last Updated 2026-02-12 00:00 GMT

Overview

LLMSQLEquivalence is a metric that uses an LLM to determine whether a generated SQL query is semantically equivalent to a reference SQL query, given a database schema.

Description

This metric evaluates text-to-SQL systems by leveraging an LLM to compare a generated SQL query against a reference SQL query in the context of a provided database schema. Rather than performing exact string matching, the metric uses semantic understanding of SQL to determine logical equivalence.

The algorithm works as follows:

  1. The reference SQL, generated SQL, and database schema (provided as reference_contexts joined by newlines) are packaged into an EquivalenceInput Pydantic model.
  2. The input is sent to the LLM via an EquivalencePrompt that instructs: "Explain and compare two SQL queries (Q1 and Q2) based on the provided database schema. First, explain each query, then determine if they have significant logical differences."
  3. The LLM returns an EquivalenceOutput containing:
    • response_query_explaination - An explanation of the generated SQL query
    • reference_query_explaination - An explanation of the reference SQL query
    • equivalence - A boolean indicating semantic equivalence
  4. The boolean is cast to an integer: 1 for equivalent, 0 for not equivalent.

The prompt includes a few-shot example demonstrating that active = 1 and active = true are semantically equivalent on a BOOLEAN column.

The output type is BINARY, reflecting the yes/no nature of equivalence.

Usage

Use this metric when evaluating text-to-SQL systems where the generated SQL may differ syntactically from the reference but produce the same results. It handles cases such as different boolean representations, equivalent JOIN orderings, aliasing differences, and other syntactic variations that do not affect query semantics. The database schema must be provided in the reference_contexts field.

Code Reference

Source Location

Signature

@dataclass
class LLMSQLEquivalence(MetricWithLLM, SingleTurnMetric):
    name: str = "llm_sql_equivalence_with_reference"
    _required_columns: t.Dict[MetricType, t.Set[str]] = field(
        default_factory=lambda: {
            MetricType.SINGLE_TURN: {"response", "reference", "reference_contexts"}
        }
    )
    output_type: t.Optional[MetricOutputType] = MetricOutputType.BINARY
    equivalence_prompt: PydanticPrompt = EquivalencePrompt()

Import

from ragas.metrics import LLMSQLEquivalence

I/O Contract

Inputs

Name Type Required Description
response str Yes The generated SQL query to evaluate
reference str Yes The reference (ground truth) SQL query
reference_contexts list[str] Yes The database schema description; multiple entries are joined with newlines to form the full schema

Outputs

Name Type Description
score int 1 if the generated SQL is semantically equivalent to the reference SQL, 0 otherwise

Usage Examples

Basic Usage

from ragas.metrics import LLMSQLEquivalence
from ragas.dataset_schema import SingleTurnSample

metric = LLMSQLEquivalence()
# metric.llm = your_llm

sample = SingleTurnSample(
    response="SELECT id, name FROM users WHERE active = true;",
    reference="SELECT id, name FROM users WHERE active = 1;",
    reference_contexts=[
        "Table users:",
        "- id: INT",
        "- name: VARCHAR",
        "- active: BOOLEAN",
    ],
)

# score = await metric.single_turn_ascore(sample)
# Returns 1 (equivalent) or 0 (not equivalent)

Evaluating Complex Queries

from ragas.metrics import LLMSQLEquivalence
from ragas.dataset_schema import SingleTurnSample

metric = LLMSQLEquivalence()
# metric.llm = your_llm

sample = SingleTurnSample(
    response="SELECT u.name, COUNT(o.id) FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name;",
    reference="SELECT users.name, COUNT(orders.id) FROM users INNER JOIN orders ON users.id = orders.user_id GROUP BY users.name;",
    reference_contexts=[
        "Table users: id INT PRIMARY KEY, name VARCHAR(255)",
        "Table orders: id INT PRIMARY KEY, user_id INT REFERENCES users(id), amount DECIMAL",
    ],
)

# score = await metric.single_turn_ascore(sample)
# Returns 1 since both queries are logically equivalent (alias vs full name)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment