Implementation:Neuml Txtai SQL Token
| Knowledge Sources | |
|---|---|
| Domains | SQL_Parsing, Query_Processing |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Token is a static utility class that provides methods for classifying, validating, and formatting SQL tokens during expression parsing.
Description
The Token class contains exclusively static methods and class-level constants used throughout the txtai SQL parsing pipeline. It defines the grammar of recognized SQL constructs including operators (=, !=, LIKE, BETWEEN, IS, NOT, etc.), logic separators (AND, OR), sort order keywords (ASC, DESC), alias keywords (AS), and the distinct keyword. Each is* method tests whether a token belongs to a specific category, enabling the Expression class to dispatch tokens to appropriate handlers.
The class also provides utility methods: get() for safe positional access into token lists, normalize() for case-insensitive alias matching, and wrapspace() for applying context-sensitive whitespace rules when rebuilding SQL strings. The SIMILAR_TOKEN constant (__SIMILAR__) is the placeholder prefix used when substituting similar() function calls.
Usage
Token is used as a classification engine by the Expression class and the broader SQL parser in txtai. All methods are static and require no instantiation. It is referenced whenever the SQL parser needs to determine how to handle the next token in a stream -- whether it is a column name, operator, function call, bracket, separator, or alias.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File: src/python/txtai/database/sql/token.py
- Lines: 1-342
Signature
class Token:
"""
Methods to check for token type.
"""
# Similar token replacement
SIMILAR_TOKEN = "__SIMILAR__"
# Default distinct token
DISTINCT = ["distinct"]
# Default alias token
ALIAS = ["as"]
# Default list of comparison operators
OPERATORS = ["=", "!=", "<>", ">", ">=", "<", "<=", "+", "-", "*", "/", "%", "||",
"not", "between", "like", "is", "null"]
# Default list of logic separators
LOGIC_SEPARATORS = ["and", "or"]
# Default list of sort order operators
SORT_ORDER = ["asc", "desc"]
Import
from txtai.database.sql import Token
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (no constructor parameters) | N/A | N/A | Token is a static utility class with no instance state; all methods are called as Token.methodname(args)
|
Outputs
| Name | Type | Description |
|---|---|---|
| (various) | bool or str | Each static method returns a boolean classification result or a formatted string |
Class Constants
| Constant | Value | Description | |
|---|---|---|---|
| SIMILAR_TOKEN | "__SIMILAR__" |
Placeholder prefix for substituted similar() function calls | |
| DISTINCT | ["distinct"] |
Keywords recognized as DISTINCT | |
| ALIAS | ["as"] |
Keywords recognized as alias introducers | |
| OPERATORS | ["=", "!=", "<>", ">", ">=", "<", "<=", "+", "-", "*", "/", "%", " |
", "not", "between", "like", "is", "null"] | All recognized SQL comparison and arithmetic operators |
| LOGIC_SEPARATORS | ["and", "or"] |
Logical connectors between clauses | |
| SORT_ORDER | ["asc", "desc"] |
Sort direction keywords |
Static Methods
get(tokens, x)
Safely retrieves tokens[x], returning None if x is out of bounds. Used throughout the parser to peek at adjacent tokens without risking IndexError.
isalias(tokens, x, alias)
Returns True if the token at position x is an alias expression -- i.e., alias processing is enabled, the prior token is not a separator/grouping/distinct token, and the current token is a column or quoted token.
isattribute(tokens, x)
Returns True if the token at position x is a standalone attribute -- a column token not followed by an operator.
isbracket(token)
Returns True if the token is an open bracket ([).
iscolumn(token)
Returns True if the token is a column name: not an operator, not a logic separator, not a literal, and not a sort order keyword.
iscompound(tokens, x)
Returns True if the token at position x is part of a compound expression (column OPERATOR column), detected by an operator token with an adjacent column token.
isdistinct(token)
Returns True if the token is the DISTINCT keyword (case-insensitive).
isfunction(tokens, x)
Returns True if the token at position x is a function call -- a column token immediately followed by an open parenthesis.
isgroupstart(token)
Returns True if the token is an open parenthesis (().
isliteral(token)
Returns True if the token is a literal value: starts with a quote, comma, parenthesis, or wildcard, or is numeric.
islogicseparator(token)
Returns True if the token is AND or OR (case-insensitive).
isoperator(token)
Returns True if the token is a recognized operator from the OPERATORS list (case-insensitive).
isquoted(token)
Returns True if the token starts and ends with matching quotes (single or double).
isseparator(token)
Returns True if the token is a comma (,).
issimilar(tokens, x, similar)
Returns True if the token at position x is the similar keyword followed by an open parenthesis, and similar processing is enabled (similar list is not None).
issortorder(token)
Returns True if the token is ASC or DESC (case-insensitive).
normalize(token)
Strips single and double quotes and converts to lowercase. Used for case-insensitive alias matching.
wrapspace(text, token)
Applies context-sensitive whitespace rules: operators and logic separators get surrounding spaces, commas get trailing space, wildcards after space or open-paren get no space, and tokens after brackets/parens get no leading space. Default behavior adds a leading space.
Usage Examples
Basic Usage
from txtai.database.sql import Token
# Token classification
print(Token.isoperator("=")) # True
print(Token.isoperator("id")) # False
print(Token.iscolumn("category")) # True
print(Token.isliteral("'hello'")) # True
print(Token.isliteral("42")) # True
print(Token.islogicseparator("and")) # True
print(Token.issortorder("desc")) # True
# Safe token access
tokens = ["id", "=", "5"]
print(Token.get(tokens, 0)) # "id"
print(Token.get(tokens, 10)) # None
# Normalization
print(Token.normalize('"Category"')) # "category"
print(Token.normalize("'Name'")) # "name"
# Whitespace formatting
text = "id"
print(Token.wrapspace(text, "=")) # " = "
print(Token.wrapspace("id = ", "5")) # "5"
print(Token.wrapspace("fn", "(")) # "("
# Function detection
tokens = ["count", "(", "id", ")"]
print(Token.isfunction(tokens, 0)) # True
print(Token.isfunction(tokens, 2)) # False
# Similar detection
similar_list = []
tokens = ["similar", "(", "'query'", ")"]
print(Token.issimilar(tokens, 0, similar_list)) # True
print(Token.issimilar(tokens, 0, None)) # False