Implementation:Neuml Txtai RAG Init

Knowledge Sources	txtai txtai Documentation
Domains	NLP, RAG
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for configuring and initializing a Retrieval-Augmented Generation pipeline, provided by the txtai library.

Description

The RAG.__init__ method constructs a RAG pipeline by connecting a retrieval backend (an Embeddings instance or Similarity pipeline) with a generative model (LLM, extractive QA model, or custom pipeline). The constructor auto-detects the model type based on the provided path and task parameter, loading either an LLM pipeline for generative tasks or a Questions pipeline for extractive question-answering.

The prompt template defines how retrieved context and the user's question are combined into the model input. The template must contain {question} and {context} placeholders. An optional system prompt can be provided to set behavioral constraints for the language model, such as instructing it to answer only from the given context.

The constructor also configures retrieval parameters: context sets the number of top passages to retrieve, minscore filters out low-confidence matches, and mintokens filters out trivially short passages. The output parameter determines the answer format: "default" for (name, answer) tuples, "flatten" for plain strings, or "reference" for (name, answer, reference) tuples with source attribution.

The RAG class supports multiple LLM backends through its model loading logic. If the path points to a HuggingFace model, it loads via the transformers library. If the path is a GGUF file, it loads via llama.cpp. If the path is prefixed with a LiteLLM provider, it routes through the LiteLLM API abstraction layer.

Usage

Use RAG.__init__ when you need to:

Connect a content-enabled Embeddings index to a language model for question answering.
Configure prompt templates and system prompts for grounded generation.
Set up retrieval parameters (context size, minimum score) for the RAG pipeline.
Choose between different LLM backends (HuggingFace, llama.cpp, LiteLLM, extractive QA).

Code Reference

Source Location

Repository: txtai
File: src/python/txtai/pipeline/llm/rag.py
Lines: L24-91

Signature

class RAG(Pipeline):
    def __init__(
        self,
        similarity,
        path,
        quantize=False,
        gpu=True,
        model=None,
        tokenizer=None,
        minscore=None,
        mintokens=None,
        context=None,
        task=None,
        output="default",
        template=None,
        separator=" ",
        system=None,
        **kwargs,
    ):
        ...

Import

from txtai.pipeline import RAG

I/O Contract

Inputs

Name	Type	Required	Description
similarity	`Embeddings` or `Similarity`	Yes	Retrieval backend. Must have `content=True` for embeddings-based RAG so that search returns text.
path	`str` or `Pipeline`	Yes	Path to the generative model (HuggingFace model ID, GGUF file path, LiteLLM provider string) or an existing pipeline instance.
quantize	`bool`	No	Quantize model before inference for reduced memory. Default: `False`
gpu	`bool`	No	Use GPU for inference if available. Default: `True`
model	`object` or `None`	No	Pre-loaded model instance to wrap. Default: `None`
tokenizer	`Tokenizer` or `None`	No	Custom tokenizer for text processing. Default: auto-detect
minscore	`float` or `None`	No	Minimum similarity score to include a context match. Default: `0.0`
mintokens	`float` or `None`	No	Minimum token count to include a context match. Default: `0.0`
context	`int` or `None`	No	Number of top context matches to retrieve. Default: `3`
task	`str` or `None`	No	Model task: `"language-generation"`, `"sequence-sequence"`, or `"question-answering"`. Default: auto-detect
output	`str`	No	Output format: `"default"`, `"flatten"`, or `"reference"`. Default: `"default"`
template	`str` or `None`	No	Prompt template with `{question}` and `{context}` placeholders. Default: `"{question} {context}"`
separator	`str`	No	String used to join multiple context passages. Default: `" "`
system	`str` or `None`	No	System prompt template. Supports `{question}` and `{context}` placeholders. Default: `None`

Outputs

Name	Type	Description
rag	`RAG`	Fully initialized RAG pipeline instance ready to accept questions via `__call__`.

Usage Examples

Basic Example: RAG with a HuggingFace Model

from txtai.embeddings import Embeddings
from txtai.pipeline import RAG

# Build content-enabled index
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})
embeddings.index([
    "Python is a programming language.",
    "txtai builds AI-powered search.",
    "RAG combines retrieval with generation.",
])

# Initialize RAG pipeline
rag = RAG(
    similarity=embeddings,
    path="google/flan-t5-base",
    template="Answer the question based on the context.\n\n"
             "Context: {context}\n\n"
             "Question: {question}",
    context=3
)

Using LiteLLM for API-Based Models

from txtai.embeddings import Embeddings
from txtai.pipeline import RAG

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})
embeddings.index(["Document text 1.", "Document text 2."])

# Use OpenAI via LiteLLM
rag = RAG(
    similarity=embeddings,
    path="openai/gpt-4",
    template="Based on this context: {context}\n\nAnswer: {question}",
    context=5,
    minscore=0.3,
    output="flatten",
    system="You are a helpful assistant. Answer only based on the provided context."
)

Extractive QA Mode

from txtai.embeddings import Embeddings
from txtai.pipeline import RAG

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})
embeddings.index(["The capital of France is Paris."])

# Use an extractive QA model (auto-detected via task)
rag = RAG(
    similarity=embeddings,
    path="distilbert-base-cased-distilled-squad",
    task="question-answering",
    context=3
)

Reference Output Mode

from txtai.embeddings import Embeddings
from txtai.pipeline import RAG

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})
embeddings.index([
    "The Eiffel Tower is 330 meters tall.",
    "The Eiffel Tower was built in 1889.",
    "The Eiffel Tower is located in Paris.",
])

# Configure RAG with reference output for source attribution
rag = RAG(
    similarity=embeddings,
    path="google/flan-t5-base",
    template="Context: {context}\nQuestion: {question}\nAnswer:",
    output="reference",
    context=2,
    minscore=0.1
)

Related Pages

Implements Principle

Principle:Neuml_Txtai_Retrieval_Augmented_Generation

Requires Environment

Uses Heuristic

Heuristic:Neuml_Txtai_LLM_Context_Window_Fallback

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment