Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Neuml Txtai RAG Init

From Leeroopedia


Knowledge Sources
Domains NLP, RAG
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for configuring and initializing a Retrieval-Augmented Generation pipeline, provided by the txtai library.

Description

The RAG.__init__ method constructs a RAG pipeline by connecting a retrieval backend (an Embeddings instance or Similarity pipeline) with a generative model (LLM, extractive QA model, or custom pipeline). The constructor auto-detects the model type based on the provided path and task parameter, loading either an LLM pipeline for generative tasks or a Questions pipeline for extractive question-answering.

The prompt template defines how retrieved context and the user's question are combined into the model input. The template must contain {question} and {context} placeholders. An optional system prompt can be provided to set behavioral constraints for the language model, such as instructing it to answer only from the given context.

The constructor also configures retrieval parameters: context sets the number of top passages to retrieve, minscore filters out low-confidence matches, and mintokens filters out trivially short passages. The output parameter determines the answer format: "default" for (name, answer) tuples, "flatten" for plain strings, or "reference" for (name, answer, reference) tuples with source attribution.

The RAG class supports multiple LLM backends through its model loading logic. If the path points to a HuggingFace model, it loads via the transformers library. If the path is a GGUF file, it loads via llama.cpp. If the path is prefixed with a LiteLLM provider, it routes through the LiteLLM API abstraction layer.

Usage

Use RAG.__init__ when you need to:

  • Connect a content-enabled Embeddings index to a language model for question answering.
  • Configure prompt templates and system prompts for grounded generation.
  • Set up retrieval parameters (context size, minimum score) for the RAG pipeline.
  • Choose between different LLM backends (HuggingFace, llama.cpp, LiteLLM, extractive QA).

Code Reference

Source Location

  • Repository: txtai
  • File: src/python/txtai/pipeline/llm/rag.py
  • Lines: L24-91

Signature

class RAG(Pipeline):
    def __init__(
        self,
        similarity,
        path,
        quantize=False,
        gpu=True,
        model=None,
        tokenizer=None,
        minscore=None,
        mintokens=None,
        context=None,
        task=None,
        output="default",
        template=None,
        separator=" ",
        system=None,
        **kwargs,
    ):
        ...

Import

from txtai.pipeline import RAG

I/O Contract

Inputs

Name Type Required Description
similarity Embeddings or Similarity Yes Retrieval backend. Must have content=True for embeddings-based RAG so that search returns text.
path str or Pipeline Yes Path to the generative model (HuggingFace model ID, GGUF file path, LiteLLM provider string) or an existing pipeline instance.
quantize bool No Quantize model before inference for reduced memory. Default: False
gpu bool No Use GPU for inference if available. Default: True
model object or None No Pre-loaded model instance to wrap. Default: None
tokenizer Tokenizer or None No Custom tokenizer for text processing. Default: auto-detect
minscore float or None No Minimum similarity score to include a context match. Default: 0.0
mintokens float or None No Minimum token count to include a context match. Default: 0.0
context int or None No Number of top context matches to retrieve. Default: 3
task str or None No Model task: "language-generation", "sequence-sequence", or "question-answering". Default: auto-detect
output str No Output format: "default", "flatten", or "reference". Default: "default"
template str or None No Prompt template with {question} and {context} placeholders. Default: "{question} {context}"
separator str No String used to join multiple context passages. Default: " "
system str or None No System prompt template. Supports {question} and {context} placeholders. Default: None

Outputs

Name Type Description
rag RAG Fully initialized RAG pipeline instance ready to accept questions via __call__.

Usage Examples

Basic Example: RAG with a HuggingFace Model

from txtai.embeddings import Embeddings
from txtai.pipeline import RAG

# Build content-enabled index
embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})
embeddings.index([
    "Python is a programming language.",
    "txtai builds AI-powered search.",
    "RAG combines retrieval with generation.",
])

# Initialize RAG pipeline
rag = RAG(
    similarity=embeddings,
    path="google/flan-t5-base",
    template="Answer the question based on the context.\n\n"
             "Context: {context}\n\n"
             "Question: {question}",
    context=3
)

Using LiteLLM for API-Based Models

from txtai.embeddings import Embeddings
from txtai.pipeline import RAG

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})
embeddings.index(["Document text 1.", "Document text 2."])

# Use OpenAI via LiteLLM
rag = RAG(
    similarity=embeddings,
    path="openai/gpt-4",
    template="Based on this context: {context}\n\nAnswer: {question}",
    context=5,
    minscore=0.3,
    output="flatten",
    system="You are a helpful assistant. Answer only based on the provided context."
)

Extractive QA Mode

from txtai.embeddings import Embeddings
from txtai.pipeline import RAG

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})
embeddings.index(["The capital of France is Paris."])

# Use an extractive QA model (auto-detected via task)
rag = RAG(
    similarity=embeddings,
    path="distilbert-base-cased-distilled-squad",
    task="question-answering",
    context=3
)

Reference Output Mode

from txtai.embeddings import Embeddings
from txtai.pipeline import RAG

embeddings = Embeddings({
    "path": "sentence-transformers/all-MiniLM-L6-v2",
    "content": True
})
embeddings.index([
    "The Eiffel Tower is 330 meters tall.",
    "The Eiffel Tower was built in 1889.",
    "The Eiffel Tower is located in Paris.",
])

# Configure RAG with reference output for source attribution
rag = RAG(
    similarity=embeddings,
    path="google/flan-t5-base",
    template="Context: {context}\nQuestion: {question}\nAnswer:",
    output="reference",
    context=2,
    minscore=0.1
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment