Principle:AnswerDotAI RAGatouille Semantic Search
| Knowledge Sources | |
|---|---|
| Domains | NLP, Information_Retrieval, Search |
| Last Updated | 2026-02-12 12:00 GMT |
Overview
A retrieval mechanism that finds the most relevant passages in a pre-built PLAID index by encoding a query into token-level embeddings and computing late-interaction MaxSim scores against indexed document representations.
Description
Semantic Search in the ColBERT framework operates on a pre-built PLAID index. Given a query string (or batch of queries), the system encodes the query into token-level embeddings, then uses the PLAID engine to efficiently retrieve the top-k most relevant passages. The PLAID search algorithm uses centroid interaction to prune the candidate set before performing full late-interaction scoring on the remaining candidates.
The search pipeline involves:
- Query encoding into token-level embeddings via the ColBERT checkpoint
- Centroid-based candidate generation using the inverted index
- Decompression of candidate document residuals
- Full MaxSim scoring between query and candidate token embeddings
- Result formatting with content, scores, ranks, document IDs, and optional metadata
Usage
Use this principle after building or loading an index. This is the primary online retrieval mechanism for:
- Answering user queries against an indexed document collection
- Providing context for RAG (Retrieval-Augmented Generation) pipelines
- Batch search across multiple queries simultaneously
- Filtered search restricted to specific document IDs
Theoretical Basis
ColBERT search computes relevance via the MaxSim operator:
PLAID accelerates this by:
1. Centroid Interaction: Compute approximate scores using only centroid representations to prune the candidate set.
2. Candidate Refinement: For top candidates, decompress the quantized residuals and compute exact MaxSim scores.
3. Configurable Precision: Parameters like ncells (number of centroids to probe) and ndocs (candidate pool size) trade off between speed and recall.
The searcher dynamically adapts these parameters based on collection size:
- <10k documents: ncells=8, centroid_score_threshold=0.4
- 10k-100k documents: ncells=4, centroid_score_threshold=0.45
- >100k documents: default settings