Implementation:Neuml Txtai RAG Init
| Knowledge Sources | |
|---|---|
| Domains | NLP, RAG |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for configuring and initializing a Retrieval-Augmented Generation pipeline, provided by the txtai library.
Description
The RAG.__init__ method constructs a RAG pipeline by connecting a retrieval backend (an Embeddings instance or Similarity pipeline) with a generative model (LLM, extractive QA model, or custom pipeline). The constructor auto-detects the model type based on the provided path and task parameter, loading either an LLM pipeline for generative tasks or a Questions pipeline for extractive question-answering.
The prompt template defines how retrieved context and the user's question are combined into the model input. The template must contain {question} and {context} placeholders. An optional system prompt can be provided to set behavioral constraints for the language model, such as instructing it to answer only from the given context.
The constructor also configures retrieval parameters: context sets the number of top passages to retrieve, minscore filters out low-confidence matches, and mintokens filters out trivially short passages. The output parameter determines the answer format: "default" for (name, answer) tuples, "flatten" for plain strings, or "reference" for (name, answer, reference) tuples with source attribution.
The RAG class supports multiple LLM backends through its model loading logic. If the path points to a HuggingFace model, it loads via the transformers library. If the path is a GGUF file, it loads via llama.cpp. If the path is prefixed with a LiteLLM provider, it routes through the LiteLLM API abstraction layer.
Usage
Use RAG.__init__ when you need to:
- Connect a content-enabled Embeddings index to a language model for question answering.
- Configure prompt templates and system prompts for grounded generation.
- Set up retrieval parameters (context size, minimum score) for the RAG pipeline.
- Choose between different LLM backends (HuggingFace, llama.cpp, LiteLLM, extractive QA).
Code Reference
Source Location
- Repository: txtai
- File:
src/python/txtai/pipeline/llm/rag.py - Lines: L24-91
Signature
class RAG(Pipeline):
def __init__(
self,
similarity,
path,
quantize=False,
gpu=True,
model=None,
tokenizer=None,
minscore=None,
mintokens=None,
context=None,
task=None,
output="default",
template=None,
separator=" ",
system=None,
**kwargs,
):
...
Import
from txtai.pipeline import RAG
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| similarity | Embeddings or Similarity |
Yes | Retrieval backend. Must have content=True for embeddings-based RAG so that search returns text.
|
| path | str or Pipeline |
Yes | Path to the generative model (HuggingFace model ID, GGUF file path, LiteLLM provider string) or an existing pipeline instance. |
| quantize | bool |
No | Quantize model before inference for reduced memory. Default: False
|
| gpu | bool |
No | Use GPU for inference if available. Default: True
|
| model | object or None |
No | Pre-loaded model instance to wrap. Default: None
|
| tokenizer | Tokenizer or None |
No | Custom tokenizer for text processing. Default: auto-detect |
| minscore | float or None |
No | Minimum similarity score to include a context match. Default: 0.0
|
| mintokens | float or None |
No | Minimum token count to include a context match. Default: 0.0
|
| context | int or None |
No | Number of top context matches to retrieve. Default: 3
|
| task | str or None |
No | Model task: "language-generation", "sequence-sequence", or "question-answering". Default: auto-detect
|
| output | str |
No | Output format: "default", "flatten", or "reference". Default: "default"
|
| template | str or None |
No | Prompt template with {question} and {context} placeholders. Default: "{question} {context}"
|
| separator | str |
No | String used to join multiple context passages. Default: " "
|
| system | str or None |
No | System prompt template. Supports {question} and {context} placeholders. Default: None
|
Outputs
| Name | Type | Description |
|---|---|---|
| rag | RAG |
Fully initialized RAG pipeline instance ready to accept questions via __call__.
|
Usage Examples
Basic Example: RAG with a HuggingFace Model
from txtai.embeddings import Embeddings
from txtai.pipeline import RAG
# Build content-enabled index
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True
})
embeddings.index([
"Python is a programming language.",
"txtai builds AI-powered search.",
"RAG combines retrieval with generation.",
])
# Initialize RAG pipeline
rag = RAG(
similarity=embeddings,
path="google/flan-t5-base",
template="Answer the question based on the context.\n\n"
"Context: {context}\n\n"
"Question: {question}",
context=3
)
Using LiteLLM for API-Based Models
from txtai.embeddings import Embeddings
from txtai.pipeline import RAG
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True
})
embeddings.index(["Document text 1.", "Document text 2."])
# Use OpenAI via LiteLLM
rag = RAG(
similarity=embeddings,
path="openai/gpt-4",
template="Based on this context: {context}\n\nAnswer: {question}",
context=5,
minscore=0.3,
output="flatten",
system="You are a helpful assistant. Answer only based on the provided context."
)
Extractive QA Mode
from txtai.embeddings import Embeddings
from txtai.pipeline import RAG
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True
})
embeddings.index(["The capital of France is Paris."])
# Use an extractive QA model (auto-detected via task)
rag = RAG(
similarity=embeddings,
path="distilbert-base-cased-distilled-squad",
task="question-answering",
context=3
)
Reference Output Mode
from txtai.embeddings import Embeddings
from txtai.pipeline import RAG
embeddings = Embeddings({
"path": "sentence-transformers/all-MiniLM-L6-v2",
"content": True
})
embeddings.index([
"The Eiffel Tower is 330 meters tall.",
"The Eiffel Tower was built in 1889.",
"The Eiffel Tower is located in Paris.",
])
# Configure RAG with reference output for source attribution
rag = RAG(
similarity=embeddings,
path="google/flan-t5-base",
template="Context: {context}\nQuestion: {question}\nAnswer:",
output="reference",
context=2,
minscore=0.1
)