Implementation:Neuml Txtai SQL Token

Knowledge Sources	Neuml_Txtai
Domains	SQL_Parsing, Query_Processing
Last Updated	2026-02-09 17:00 GMT

Overview

Token is a static utility class that provides methods for classifying, validating, and formatting SQL tokens during expression parsing.

Description

The Token class contains exclusively static methods and class-level constants used throughout the txtai SQL parsing pipeline. It defines the grammar of recognized SQL constructs including operators (=, !=, LIKE, BETWEEN, IS, NOT, etc.), logic separators (AND, OR), sort order keywords (ASC, DESC), alias keywords (AS), and the distinct keyword. Each is* method tests whether a token belongs to a specific category, enabling the Expression class to dispatch tokens to appropriate handlers.

The class also provides utility methods: get() for safe positional access into token lists, normalize() for case-insensitive alias matching, and wrapspace() for applying context-sensitive whitespace rules when rebuilding SQL strings. The SIMILAR_TOKEN constant (__SIMILAR__) is the placeholder prefix used when substituting similar() function calls.

Usage

Token is used as a classification engine by the Expression class and the broader SQL parser in txtai. All methods are static and require no instantiation. It is referenced whenever the SQL parser needs to determine how to handle the next token in a stream -- whether it is a column name, operator, function call, bracket, separator, or alias.

Code Reference

Source Location

Repository: Neuml_Txtai
File: src/python/txtai/database/sql/token.py
Lines: 1-342

Signature

class Token:
    """
    Methods to check for token type.
    """

    # Similar token replacement
    SIMILAR_TOKEN = "__SIMILAR__"

    # Default distinct token
    DISTINCT = ["distinct"]

    # Default alias token
    ALIAS = ["as"]

    # Default list of comparison operators
    OPERATORS = ["=", "!=", "<>", ">", ">=", "<", "<=", "+", "-", "*", "/", "%", "||",
                 "not", "between", "like", "is", "null"]

    # Default list of logic separators
    LOGIC_SEPARATORS = ["and", "or"]

    # Default list of sort order operators
    SORT_ORDER = ["asc", "desc"]

Import

from txtai.database.sql import Token

I/O Contract

Inputs

Name	Type	Required	Description
(no constructor parameters)	N/A	N/A	Token is a static utility class with no instance state; all methods are called as `Token.methodname(args)`

Outputs

Name	Type	Description
(various)	bool or str	Each static method returns a boolean classification result or a formatted string

Class Constants

Constant	Value	Description
SIMILAR_TOKEN	`"__SIMILAR__"`	Placeholder prefix for substituted similar() function calls
DISTINCT	`["distinct"]`	Keywords recognized as DISTINCT
ALIAS	`["as"]`	Keywords recognized as alias introducers
OPERATORS	`["=", "!=", "<>", ">", ">=", "<", "<=", "+", "-", "*", "/", "%", "`	", "not", "between", "like", "is", "null"]	All recognized SQL comparison and arithmetic operators
LOGIC_SEPARATORS	`["and", "or"]`	Logical connectors between clauses
SORT_ORDER	`["asc", "desc"]`	Sort direction keywords

Static Methods

get(tokens, x)

Safely retrieves tokens[x], returning None if x is out of bounds. Used throughout the parser to peek at adjacent tokens without risking IndexError.

isalias(tokens, x, alias)

Returns True if the token at position x is an alias expression -- i.e., alias processing is enabled, the prior token is not a separator/grouping/distinct token, and the current token is a column or quoted token.

isattribute(tokens, x)

Returns True if the token at position x is a standalone attribute -- a column token not followed by an operator.

isbracket(token)

Returns True if the token is an open bracket ([).

iscolumn(token)

Returns True if the token is a column name: not an operator, not a logic separator, not a literal, and not a sort order keyword.

iscompound(tokens, x)

Returns True if the token at position x is part of a compound expression (column OPERATOR column), detected by an operator token with an adjacent column token.

isdistinct(token)

Returns True if the token is the DISTINCT keyword (case-insensitive).

isfunction(tokens, x)

Returns True if the token at position x is a function call -- a column token immediately followed by an open parenthesis.

isgroupstart(token)

Returns True if the token is an open parenthesis (().

isliteral(token)

Returns True if the token is a literal value: starts with a quote, comma, parenthesis, or wildcard, or is numeric.

islogicseparator(token)

Returns True if the token is AND or OR (case-insensitive).

isoperator(token)

Returns True if the token is a recognized operator from the OPERATORS list (case-insensitive).

isquoted(token)

Returns True if the token starts and ends with matching quotes (single or double).

isseparator(token)

Returns True if the token is a comma (,).

issimilar(tokens, x, similar)

Returns True if the token at position x is the similar keyword followed by an open parenthesis, and similar processing is enabled (similar list is not None).

issortorder(token)

Returns True if the token is ASC or DESC (case-insensitive).

normalize(token)

Strips single and double quotes and converts to lowercase. Used for case-insensitive alias matching.

wrapspace(text, token)

Applies context-sensitive whitespace rules: operators and logic separators get surrounding spaces, commas get trailing space, wildcards after space or open-paren get no space, and tokens after brackets/parens get no leading space. Default behavior adds a leading space.

Usage Examples

Basic Usage

from txtai.database.sql import Token

# Token classification
print(Token.isoperator("="))        # True
print(Token.isoperator("id"))       # False
print(Token.iscolumn("category"))   # True
print(Token.isliteral("'hello'"))   # True
print(Token.isliteral("42"))        # True
print(Token.islogicseparator("and"))  # True
print(Token.issortorder("desc"))    # True

# Safe token access
tokens = ["id", "=", "5"]
print(Token.get(tokens, 0))   # "id"
print(Token.get(tokens, 10))  # None

# Normalization
print(Token.normalize('"Category"'))  # "category"
print(Token.normalize("'Name'"))      # "name"

# Whitespace formatting
text = "id"
print(Token.wrapspace(text, "="))     # " = "
print(Token.wrapspace("id = ", "5"))  # "5"
print(Token.wrapspace("fn", "("))     # "("

# Function detection
tokens = ["count", "(", "id", ")"]
print(Token.isfunction(tokens, 0))  # True
print(Token.isfunction(tokens, 2))  # False

# Similar detection
similar_list = []
tokens = ["similar", "(", "'query'", ")"]
print(Token.issimilar(tokens, 0, similar_list))  # True
print(Token.issimilar(tokens, 0, None))           # False

Related Pages

Principle:Neuml_Txtai_SQL_Query_Processing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment