Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai SQL Expression

From Leeroopedia


Knowledge Sources
Domains SQL_Parsing, Query_Processing
Last Updated 2026-02-09 17:00 GMT

Overview

Expression is a SQL expression parser and token processor that rewrites query column names, resolves bracket expressions, substitutes similar() function calls, and formats output as text or lists.

Description

The Expression class parses tokenized SQL expressions and applies a series of transformation rules to produce database-ready SQL. It walks through a token stream, classifying each token using the Token utility class, and dispatches to specialized handlers for brackets, similar() functions, regular functions, aliases, attributes, and compound expressions. The resolver callback translates user-facing column names into their actual database column representations (e.g., mapping category to a JSON extract expression).

The class supports two output modes: text mode (tolist=False) rebuilds the expression as a single string with proper whitespace, and list mode (tolist=True) splits the expression into comma-separated components. When alias processing is enabled, the class generates SQL alias clauses by comparing original and transformed tokens.

Usage

Use Expression as part of the txtai SQL parsing pipeline. It is called internally by the SQL parser to process SELECT, WHERE, GROUP BY, HAVING, and ORDER BY clauses. It handles the special similar() function by extracting its parameters and replacing it with a placeholder token that the database layer later fills with similarity results.

Code Reference

Source Location

Signature

class Expression:
    """
    Parses expression statements and runs a set of substitution/formatting rules.
    """

    def __init__(self, resolver, tolist):
        """
        Creates a new expression parser.

        Args:
            resolver: function to call to resolve query column names with database column names
            tolist: outputs expression lists if True, text if False
        """

        self.resolver = resolver
        self.tolist = tolist

Import

from txtai.database.sql import Expression

I/O Contract

Inputs

Name Type Required Description
resolver callable Yes Function that maps query column names to database column names; signature: resolver(name, alias=None) -> str
tolist bool Yes If True, output is returned as a list of expression components; if False, output is returned as a text string

Outputs

Name Type Description
result str or list Rewritten SQL expression as text (when tolist=False) or as a list of comma-separated components (when tolist=True)

Key Methods

__call__(self, tokens, alias=False, aliases=None, similar=None)

Main entry point. Accepts a list of tokens and optional parameters for alias processing and similar() extraction. Calls process() to transform tokens, then builds the output using the appropriate format method.

process(self, tokens, alias, aliases, similar)

Core token processing loop. Iterates through tokens (skipping DISTINCT), identifies each token's role using Token static methods, and dispatches to the appropriate handler. Returns the filtered list of non-None tokens.

buildtext(self, tokens)

Reassembles tokens into a SQL text string, applying whitespace rules via Token.wrapspace() for operators, commas, brackets, and other constructs.

buildlist(self, tokens)

Splits tokens into a list of expression components on commas (respecting parenthesis and bracket nesting). Each component is built using buildtext().

buildalias(self, transformed, tokens, aliases)

Compares transformed and original tokens to generate column alias expressions. Strips brackets and DISTINCT keywords from alias names and calls the resolver to build alias clauses.

bracket(self, iterator, tokens, x)

Consumes a [bracket] expression from the token stream, builds the inner expression text, resolves it through the resolver, and replaces the bracket tokens with the resolved result.

similar(self, iterator, tokens, x, similar)

Extracts the parameters of a similar() function call, replaces the function tokens with a placeholder (__SIMILAR__N), and appends the parameters to the similar list for later processing by the database layer.

function(self, iterator, tokens, token, aliases, similar)

Recursively processes tokens within a function's parentheses, handling nested brackets, similar() calls, sub-functions, attributes, and compound expressions.

alias(self, iterator, tokens, x, aliases, index)

Reads an alias clause (with or without the AS keyword) and stores the normalized alias name mapped to its clause index in the aliases dictionary.

attribute(self, tokens, x, aliases)

Resolves a standalone attribute column name through the resolver, unless it is already in the aliases dictionary.

compound(self, iterator, tokens, x, aliases, similar)

Handles compound expressions of the form column OPERATOR value. Resolves both the left-side and right-side column names, consuming any chain of operators (including multi-word operators like NOT LIKE).

resolve(self, token, aliases)

Resolves a single token unless it is an alias or a bind parameter (starting with :). Delegates to the resolver callback.

Usage Examples

Basic Usage

from txtai.database.sql import Expression

# Define a simple resolver that prefixes columns with table alias
def resolver(name, alias=None):
    standard = {"id": "s.id", "text": "text", "score": "score", "tags": "s.tags"}
    resolved = standard.get(name, f"json_extract(d.data, '$.{name}')")
    if alias and name != alias:
        return f'{resolved} as "{alias}"'
    return resolved

# Create an expression parser for text output
expr = Expression(resolver, tolist=False)

# Parse a WHERE clause with tokens
tokens = ["category", "=", "'AI'", "and", "score", ">", "0.5"]
result = expr(tokens)
# result: "json_extract(d.data, '$.category') = 'AI' and score > 0.5"

# Create an expression parser for list output (SELECT columns)
expr_list = Expression(resolver, tolist=True)
tokens = ["id", ",", "text", ",", "score"]
result = expr_list(tokens)
# result: ["s.id", "text", "score"]

# Handle similar() function extraction
similar_params = []
tokens = ["similar", "(", "'machine learning'", ")"]
result = expr(tokens, similar=similar_params)
# result: "__SIMILAR__0"
# similar_params: [["machine learning"]]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment