Workflow:AnswerDotAI RAGatouille Document Indexing And Search
| Knowledge Sources | |
|---|---|
| Domains | Information_Retrieval, RAG, ColBERT |
| Last Updated | 2026-02-12 12:00 GMT |
Overview
End-to-end process for indexing a document collection and performing semantic search using ColBERT late-interaction retrieval via RAGatouille.
Description
This workflow covers the primary user journey in RAGatouille: loading a pretrained ColBERT model, building a compressed PLAID index from a document collection, and querying that index to retrieve relevant passages. The process handles document splitting, tokenization, embedding generation, vector compression, and on-disk index persistence automatically. It also supports optional document IDs, metadata attachment, and dynamic index updates (adding or removing documents after initial creation).
Key outputs:
- A persistent on-disk PLAID index that can be reloaded across sessions
- Ranked search results with content, scores, document IDs, and optional metadata
Scope:
- From raw text documents to ranked retrieval results
- Includes index lifecycle management (create, load, add, delete, search)
Usage
Execute this workflow when you have a collection of text documents and need to build a high-quality semantic search system. This is the right workflow when:
- You have a corpus of documents (articles, pages, paragraphs) to make searchable
- You want ColBERT's late-interaction retrieval quality without managing the underlying complexity
- You need persistent indexes that survive across application restarts
- You want to dynamically add or remove documents from the index over time
Execution Steps
Step 1: Load Pretrained Model
Initialize a RAGPretrainedModel from either a HuggingFace model name or a local checkpoint path. This loads the ColBERT encoder, sets up the inference checkpoint, and configures the GPU/CPU execution environment. The model will be used for both indexing and searching.
Key considerations:
- Use `from_pretrained()` for a fresh model from HuggingFace (e.g. `colbert-ir/colbertv2.0`)
- Use `from_index()` to reload a model directly from an existing index's saved configuration
- GPU count is auto-detected; set explicitly if needed
Step 2: Prepare Document Collection
Assemble the raw text documents into a list of strings. Optionally provide document IDs (unique identifiers for each document) and document metadata (dictionaries of key-value pairs). If no IDs are provided, UUIDs are generated automatically.
Key considerations:
- Document IDs must be unique and non-empty
- Metadata dictionaries are mapped to their corresponding document IDs
- Documents can be of any length; they will be split in the next step
Step 3: Build the Index
Invoke the indexing pipeline, which performs document splitting (via LlamaIndex sentence splitter by default), tokenization, ColBERT embedding generation, vector quantization (2-bit or 4-bit depending on collection size), and PLAID index construction. The index is persisted to disk.
What happens internally:
- Documents are split into chunks (default max 256 tokens with sentence-aware boundaries)
- A passage-to-document ID mapping is created to track which chunks belong to which document
- The PLAID indexer compresses token-level embeddings using residual compression
- For collections under 75,000 documents, a PyTorch-based KMeans replaces FAISS for portability
- Index metadata (collection, mappings, config) is serialized alongside the compressed vectors
Step 4: Search the Index
Query the index with a single string or a batch of queries. The searcher is lazily loaded on first query and cached for subsequent searches. Results are returned as ranked dictionaries containing passage content, relevance scores, ranks, document IDs, and any associated metadata.
Key considerations:
- Single queries return a list of result dictionaries; multiple queries return a list of lists
- The `k` parameter controls how many results to return (default 10)
- `force_fast` mode trades accuracy for speed on large indexes
- Results can be filtered by specific document IDs using the `doc_ids` parameter
- Query token length is dynamically adjusted based on query length
Step 5: Update the Index (Optional)
After initial index creation, documents can be added or removed dynamically. Adding documents processes them through the same splitting and encoding pipeline, then either rebuilds the index (for small collections or large additions) or uses an incremental updater. Deleting documents removes specified document IDs and their associated passage embeddings.
Key considerations:
- Add and delete operations are experimental and may trigger a full index rebuild for efficiency
- The rebuild heuristic triggers when the collection is small (<5000) or when new documents exceed 5% of existing collection
- All metadata maps are updated and re-serialized after modifications