Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:AnswerDotAI RAGatouille In Memory Search

From Leeroopedia
Knowledge Sources
Domains NLP, Information_Retrieval, Search
Last Updated 2026-02-12 12:00 GMT

Overview

An index-free search mechanism that queries pre-encoded in-memory document embeddings using exact MaxSim scoring to retrieve the most relevant passages.

Description

In-Memory Search operates on documents previously encoded via the encode method. Unlike PLAID index search which uses approximate scoring via centroids and quantized residuals, in-memory search computes exact MaxSim scores between query and document token embeddings. This provides the most accurate ColBERT scoring possible but scales poorly to large collections since all documents must be scored exhaustively.

The search process:

  • Query is encoded into token-level embeddings
  • MaxSim is computed between query tokens and all in-memory document tokens
  • Results are sorted by score and top-k are returned
  • Optional metadata is attached to results from in-memory storage

Usage

Use after encoding documents with encode(). Suitable for:

  • Searching small pre-encoded collections
  • Prototyping retrieval pipelines
  • Situations where exact scoring is required

Theoretical Basis

In-memory search computes exact late-interaction scores:

S(q,d)=i=1|q|maxj=1|d|EqiEdjT

This is implemented via batched matrix multiplication:

# Pseudo-code for MaxSim scoring
scores = D_padded @ Q.permute(0, 2, 1)  # [num_docs, doc_tokens, query_tokens]
scores = scores.max(dim=1).values        # [num_docs, query_tokens] — max over doc tokens
scores = scores.sum(dim=-1)              # [num_docs] — sum over query tokens

The score for each document is the sum of maximum cosine similarities between each query token and all document tokens. Documents with padding tokens use a mask of -inf to exclude them from the max operation.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment