Implementation:Neuml Txtai LLM Generation Base
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, NLP, LLM, Text Generation |
| Last Updated | 2026-02-10 01:00 GMT |
Overview
Concrete tool for LLM-based text generation with prompt formatting, streaming support, and response cleaning provided by txtai.
Description
Generation is the base class for all generative models in txtai. It provides common logic for building prompts (including template formatting), formatting inputs as chat messages, cleaning generated results, and handling streaming responses. The class supports multiple input formats: plain strings, lists of strings, and lists of chat message dictionaries with role and content keys. It includes intelligent detection of instruction tokens to decide whether to wrap inputs as chat messages. The class also handles thinking-tag removal (e.g. <think>...</think>) for reasoning models.
Usage
Use Generation as the base class when implementing a new LLM backend for txtai. Subclasses such as HuggingFace, LiteLLM, and OpenCode implement the stream method to provide actual inference. The base class handles prompt construction, output cleaning, and streaming orchestration.
Code Reference
Source Location
- Repository: Neuml_Txtai
- File:
src/python/txtai/pipeline/llm/generation.py
Signature
class Generation:
def __init__(self, path=None, template=None, **kwargs)
def __call__(self, text, maxlength, stream, stop, defaultrole, stripthink, **kwargs)
def ischat(self)
def isvision(self)
def format(self, texts, defaultrole)
def execute(self, texts, maxlength, stream, stop, **kwargs)
def clean(self, prompt, result, stripthink)
def cleanstream(self, results)
def cleanthink(self, text)
def response(self, result)
def stream(self, texts, maxlength, stream, stop, **kwargs) # abstract
Import
from txtai.pipeline.llm.generation import Generation
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| text | str or list | Yes | Input text, list of strings, or list of chat message dicts with "role" and "content" keys. |
| maxlength | int | Yes | Maximum sequence length for generated output. |
| stream | bool | Yes | If True, streams the response token by token. |
| stop | list | Yes | List of stop strings to terminate generation. |
| defaultrole | str | Yes | Default role for text inputs: "auto" (infer), "user" (chat message), or "prompt" (raw prompt). |
| stripthink | bool | Yes | If True, strips thinking tags from output. Defaults to False when streaming, True otherwise. |
| template | str | No | Prompt template string applied to text inputs using TemplateFormatter. |
| kwargs | dict | No | Additional generation keyword arguments passed to the underlying model. |
Outputs
| Name | Type | Description |
|---|---|---|
| result | str or list or generator | A single generated string if input was a string, a list of strings if input was a list, or a generator if streaming is enabled. |
Usage Examples
from txtai.pipeline import LLM
# Create an LLM pipeline (uses Generation base under the hood)
llm = LLM("google/flan-t5-small")
# Single string input
result = llm("Translate to French: Hello, how are you?")
# Chat-style input with role/content dicts
result = llm([
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"}
])
# Batch input
results = llm(["Summarize: ...", "Translate: ..."])
# Streaming
for token in llm("Tell me a story", stream=True):
print(token, end="")