Workflow:AnswerDotAI RAGatouille Document Indexing And Search

Knowledge Sources	RAGatouille RAGatouille Docs ColBERTv2
Domains	Information_Retrieval, RAG, ColBERT
Last Updated	2026-02-12 12:00 GMT

Overview

End-to-end process for indexing a document collection and performing semantic search using ColBERT late-interaction retrieval via RAGatouille.

Description

This workflow covers the primary user journey in RAGatouille: loading a pretrained ColBERT model, building a compressed PLAID index from a document collection, and querying that index to retrieve relevant passages. The process handles document splitting, tokenization, embedding generation, vector compression, and on-disk index persistence automatically. It also supports optional document IDs, metadata attachment, and dynamic index updates (adding or removing documents after initial creation).

Key outputs:

A persistent on-disk PLAID index that can be reloaded across sessions
Ranked search results with content, scores, document IDs, and optional metadata

Scope:

From raw text documents to ranked retrieval results
Includes index lifecycle management (create, load, add, delete, search)

Usage

Execute this workflow when you have a collection of text documents and need to build a high-quality semantic search system. This is the right workflow when:

You have a corpus of documents (articles, pages, paragraphs) to make searchable
You want ColBERT's late-interaction retrieval quality without managing the underlying complexity
You need persistent indexes that survive across application restarts
You want to dynamically add or remove documents from the index over time

Execution Steps

Step 1: Load Pretrained Model

Initialize a RAGPretrainedModel from either a HuggingFace model name or a local checkpoint path. This loads the ColBERT encoder, sets up the inference checkpoint, and configures the GPU/CPU execution environment. The model will be used for both indexing and searching.

Key considerations:

Use `from_pretrained()` for a fresh model from HuggingFace (e.g. `colbert-ir/colbertv2.0`)
Use `from_index()` to reload a model directly from an existing index's saved configuration
GPU count is auto-detected; set explicitly if needed

Step 2: Prepare Document Collection

Assemble the raw text documents into a list of strings. Optionally provide document IDs (unique identifiers for each document) and document metadata (dictionaries of key-value pairs). If no IDs are provided, UUIDs are generated automatically.

Key considerations:

Document IDs must be unique and non-empty
Metadata dictionaries are mapped to their corresponding document IDs
Documents can be of any length; they will be split in the next step

Step 3: Build the Index

Invoke the indexing pipeline, which performs document splitting (via LlamaIndex sentence splitter by default), tokenization, ColBERT embedding generation, vector quantization (2-bit or 4-bit depending on collection size), and PLAID index construction. The index is persisted to disk.

What happens internally:

Documents are split into chunks (default max 256 tokens with sentence-aware boundaries)
A passage-to-document ID mapping is created to track which chunks belong to which document
The PLAID indexer compresses token-level embeddings using residual compression
For collections under 75,000 documents, a PyTorch-based KMeans replaces FAISS for portability
Index metadata (collection, mappings, config) is serialized alongside the compressed vectors

Step 4: Search the Index

Query the index with a single string or a batch of queries. The searcher is lazily loaded on first query and cached for subsequent searches. Results are returned as ranked dictionaries containing passage content, relevance scores, ranks, document IDs, and any associated metadata.

Key considerations:

Single queries return a list of result dictionaries; multiple queries return a list of lists
The `k` parameter controls how many results to return (default 10)
`force_fast` mode trades accuracy for speed on large indexes
Results can be filtered by specific document IDs using the `doc_ids` parameter
Query token length is dynamically adjusted based on query length

Step 5: Update the Index (Optional)

After initial index creation, documents can be added or removed dynamically. Adding documents processes them through the same splitting and encoding pipeline, then either rebuilds the index (for small collections or large additions) or uses an incremental updater. Deleting documents removes specified document IDs and their associated passage embeddings.

Key considerations:

Add and delete operations are experimental and may trigger a full index rebuild for efficiency
The rebuild heuristic triggers when the collection is small (<5000) or when new documents exceed 5% of existing collection
All metadata maps are updated and re-serialized after modifications

Execution Diagram

GitHub URL

Workflow Repository