Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai QueryTranslation

From Leeroopedia


Knowledge Sources
Domains Embeddings, Search, NL2SQL, Seq2Seq
Last Updated 2026-02-10 01:00 GMT

Overview

Concrete tool for translating natural language queries into SQL using a seq2seq model provided by txtai.

Description

The Query class implements natural language to SQL query translation using a Hugging Face sequence-to-sequence (Seq2Seq) model. It loads a tokenizer and model from a specified path and uses beam search generation to convert natural language questions into SQL statements.

Key features:

  • T5 model support: Automatically adds the prefix "translate English to SQL: " for T5 models (detected via isinstance(model, T5ForConditionalGeneration)) when no custom prefix is provided.
  • Configurable prefix: An optional text prefix can be prepended to all input queries to guide the translation task.
  • Max sequence length: Controls the maximum length of generated SQL output (default 512 tokens).
  • Output cleaning: The clean method applies post-processing rules to the generated text, such as replacing $= with <= (correcting a common generation artifact).

The translation pipeline processes queries in three steps:

  1. Prepend the prefix (if configured).
  2. Tokenize the input and generate output tokens using the model with attention mask.
  3. Decode tokens to text, skip special tokens, and apply cleaning rules.

Usage

Use Query when you want to enable natural language question answering over structured data in txtai. This allows users to ask questions in plain English that are automatically converted to SQL queries against the embeddings content database. It is typically configured via the query section of the embeddings config.

Code Reference

Source Location

  • Repository: Neuml_Txtai
  • File: src/python/txtai/embeddings/search/query.py

Signature

class Query:
    def __init__(self, path, prefix=None, maxlength=512)
    def __call__(self, query) -> str
    def clean(self, text) -> str

Import

from txtai.embeddings.search.query import Query

I/O Contract

Inputs

Name Type Required Description
path str Yes Path or HF Hub model identifier for a Seq2Seq model (e.g., a T5 model fine-tuned for NL2SQL translation).
prefix str No Text prefix to prepend to all queries. Defaults to "translate English to SQL: " for T5 models.
maxlength int No Maximum sequence length for generated output (default 512).
query str Yes (__call__) Natural language query string to translate into SQL.

Outputs

Name Type Description
sql str Generated SQL query string, cleaned of common generation artifacts.
tokenizer AutoTokenizer Loaded HF tokenizer instance.
model AutoModelForSeq2SeqLM Loaded HF Seq2Seq model instance.

Usage Examples

from txtai.embeddings import Embeddings

# Configure embeddings with query translation
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True,
    "query": {
        "path": "neuml/t5-small-txtsql",
        "prefix": "translate English to SQL: ",
        "maxlength": 512
    }
})

# Index structured data
embeddings.index([
    {"id": 0, "text": "Python programming", "category": "tech", "year": 2023},
    {"id": 1, "text": "Machine learning", "category": "AI", "year": 2024},
    {"id": 2, "text": "Web development", "category": "tech", "year": 2022},
])

# Ask natural language questions
results = embeddings.search("What tech articles are from 2023?")
# The Query model translates this to something like:
# SELECT text FROM txtai WHERE category = 'tech' AND year = 2023

# Direct usage of Query class
from txtai.embeddings.search.query import Query

translator = Query("neuml/t5-small-txtsql")
sql = translator("Show all articles about machine learning")
print(sql)  # Generated SQL query

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment