Heuristic:AnswerDotAI RAGatouille In Memory Reranking Limits
| Knowledge Sources | |
|---|---|
| Domains | Search, Optimization, Information_Retrieval |
| Last Updated | 2026-02-12 12:00 GMT |
Overview
Practical limits and warnings for in-memory document reranking and encoding operations that degrade with document count and length.
Description
RAGatouille's index-free operations (rerank, encode, search_encoded_docs) work entirely in memory without building a persistent index. While convenient for small-scale use, these operations have practical limits that are enforced through runtime warnings. Performance degrades rapidly beyond 1,000 documents, and documents longer than ~300 tokens at the 90th percentile trigger slow-reranking warnings. Duplicate documents in the collection also cause performance degradation and subpar results.
Usage
Use this heuristic when deciding between in-memory reranking/encoding and building a persistent index. If your document count exceeds 1,000 or your documents are long, consider building a PLAID index instead.
The Insight (Rule of Thumb)
- Action: For collections > 1,000 documents, build a persistent index instead of using in-memory reranking.
- Value: The 1,000 document threshold triggers a warning in the code. The 300-token (90th percentile) threshold triggers a long-document warning.
- Trade-off: In-memory mode requires no index build step but O(n) scoring cost per query. Persistent index has upfront cost but O(log n) search.
- Deduplication: Always ensure documents are unique before passing to in-memory operations — duplicates slow down computation and yield subpar results.
Reasoning
In-memory reranking computes the full ColBERT MaxSim score between the query and every document in the collection. This is O(n * query_len * doc_len) in computation. For small collections this is acceptable, but it scales poorly. Building a PLAID index amortizes this cost through centroid-based candidate pruning.
Long documents (> 300 tokens at the 90th percentile) require proportionally more computation and memory for each MaxSim operation, further degrading performance.
1,000 document warning from `ragatouille/models/colbert.py:545-549`:
if len(documents) > 1000:
print(
"Please note ranking in-memory is not optimised for large document counts! ",
"Consider building an index and using search instead!",
)
Duplicate warning from `ragatouille/models/colbert.py:550-554`:
if len(set(documents)) != len(documents):
print(
"WARNING! Your documents have duplicate entries! ",
"This will slow down calculation and may yield subpar results",
)
Long document warning from `ragatouille/models/colbert.py:520-526`:
if max_tokens > 300:
print(
f"Your documents are roughly {percentile_90} tokens long at the 90th percentile!",
"This is quite long and might slow down reranking!\n",
"Provide fewer documents, build smaller chunks or run on GPU",
"if it takes too long for your needs!",
)