Principle:AnswerDotAI RAGatouille In Memory Search

Knowledge Sources	ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT RAGatouille
Domains	NLP, Information_Retrieval, Search
Last Updated	2026-02-12 12:00 GMT

Overview

An index-free search mechanism that queries pre-encoded in-memory document embeddings using exact MaxSim scoring to retrieve the most relevant passages.

Description

In-Memory Search operates on documents previously encoded via the encode method. Unlike PLAID index search which uses approximate scoring via centroids and quantized residuals, in-memory search computes exact MaxSim scores between query and document token embeddings. This provides the most accurate ColBERT scoring possible but scales poorly to large collections since all documents must be scored exhaustively.

The search process:

Query is encoded into token-level embeddings
MaxSim is computed between query tokens and all in-memory document tokens
Results are sorted by score and top-k are returned
Optional metadata is attached to results from in-memory storage

Usage

Use after encoding documents with encode(). Suitable for:

Searching small pre-encoded collections
Prototyping retrieval pipelines
Situations where exact scoring is required

Theoretical Basis

In-memory search computes exact late-interaction scores:

$S (q, d) = \sum_{i = 1}^{| q |} \max_{j = 1}^{| d |} E_{q_{i}} \cdot E_{d_{j}}^{T}$

This is implemented via batched matrix multiplication:

# Pseudo-code for MaxSim scoring
scores = D_padded @ Q.permute(0, 2, 1)  # [num_docs, doc_tokens, query_tokens]
scores = scores.max(dim=1).values        # [num_docs, query_tokens] — max over doc tokens
scores = scores.sum(dim=-1)              # [num_docs] — sum over query tokens

The score for each document is the sum of maximum cosine similarities between each query token and all document tokens. Documents with padding tokens use a mask of -inf to exclude them from the max operation.

Related Pages

Implemented By

Implementation:AnswerDotAI_RAGatouille_RAGPretrainedModel_Search_Encoded_Docs

Uses Heuristic

Heuristic:AnswerDotAI_RAGatouille_In_Memory_Reranking_Limits

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment