Principle:AnswerDotAI RAGatouille In Memory Search
| Knowledge Sources | |
|---|---|
| Domains | NLP, Information_Retrieval, Search |
| Last Updated | 2026-02-12 12:00 GMT |
Overview
An index-free search mechanism that queries pre-encoded in-memory document embeddings using exact MaxSim scoring to retrieve the most relevant passages.
Description
In-Memory Search operates on documents previously encoded via the encode method. Unlike PLAID index search which uses approximate scoring via centroids and quantized residuals, in-memory search computes exact MaxSim scores between query and document token embeddings. This provides the most accurate ColBERT scoring possible but scales poorly to large collections since all documents must be scored exhaustively.
The search process:
- Query is encoded into token-level embeddings
- MaxSim is computed between query tokens and all in-memory document tokens
- Results are sorted by score and top-k are returned
- Optional metadata is attached to results from in-memory storage
Usage
Use after encoding documents with encode(). Suitable for:
- Searching small pre-encoded collections
- Prototyping retrieval pipelines
- Situations where exact scoring is required
Theoretical Basis
In-memory search computes exact late-interaction scores:
This is implemented via batched matrix multiplication:
# Pseudo-code for MaxSim scoring
scores = D_padded @ Q.permute(0, 2, 1) # [num_docs, doc_tokens, query_tokens]
scores = scores.max(dim=1).values # [num_docs, query_tokens] — max over doc tokens
scores = scores.sum(dim=-1) # [num_docs] — sum over query tokens
The score for each document is the sum of maximum cosine similarities between each query token and all document tokens. Documents with padding tokens use a mask of -inf to exclude them from the max operation.