Principle:AnswerDotAI RAGatouille Semantic Search

Knowledge Sources	ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT PLAID: An Efficient Engine for Late Interaction Retrieval RAGatouille
Domains	NLP, Information_Retrieval, Search
Last Updated	2026-02-12 12:00 GMT

Overview

A retrieval mechanism that finds the most relevant passages in a pre-built PLAID index by encoding a query into token-level embeddings and computing late-interaction MaxSim scores against indexed document representations.

Description

Semantic Search in the ColBERT framework operates on a pre-built PLAID index. Given a query string (or batch of queries), the system encodes the query into token-level embeddings, then uses the PLAID engine to efficiently retrieve the top-k most relevant passages. The PLAID search algorithm uses centroid interaction to prune the candidate set before performing full late-interaction scoring on the remaining candidates.

The search pipeline involves:

Query encoding into token-level embeddings via the ColBERT checkpoint
Centroid-based candidate generation using the inverted index
Decompression of candidate document residuals
Full MaxSim scoring between query and candidate token embeddings
Result formatting with content, scores, ranks, document IDs, and optional metadata

Usage

Use this principle after building or loading an index. This is the primary online retrieval mechanism for:

Answering user queries against an indexed document collection
Providing context for RAG (Retrieval-Augmented Generation) pipelines
Batch search across multiple queries simultaneously
Filtered search restricted to specific document IDs

Theoretical Basis

ColBERT search computes relevance via the MaxSim operator:

$S (q, d) = \sum_{i = 1}^{| q |} \max_{j = 1}^{| d |} E_{q_{i}} \cdot E_{d_{j}}^{T}$

PLAID accelerates this by:

1. Centroid Interaction: Compute approximate scores using only centroid representations to prune the candidate set.

2. Candidate Refinement: For top candidates, decompress the quantized residuals and compute exact MaxSim scores.

3. Configurable Precision: Parameters like ncells (number of centroids to probe) and ndocs (candidate pool size) trade off between speed and recall.

The searcher dynamically adapts these parameters based on collection size:

<10k documents: ncells=8, centroid_score_threshold=0.4
10k-100k documents: ncells=4, centroid_score_threshold=0.45
>100k documents: default settings

Related Pages

Implemented By

Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_Search

Uses Heuristic

Heuristic:AnswerDotAI_RAGatouille_Searcher_Configuration_By_Collection_Size

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment