Principle:Deepset ai Haystack Cross Encoder Reranking
Metadata
| Field | Value |
|---|---|
| Principle Name | Cross-Encoder Reranking |
| Domains | Information_Retrieval, NLP |
| Related Implementation | Deepset_ai_Haystack_TransformersSimilarityRanker |
| Source Reference | haystack/components/rankers/transformers_similarity.py:L24-328
|
| Repository | Deepset_ai_Haystack |
Overview
Cross-encoder reranking uses a transformer model that jointly encodes a query-document pair to produce a relevance score, providing higher accuracy than bi-encoder retrieval at the cost of speed. It is typically employed as a second-stage reranker after an initial fast retrieval step (BM25 or embedding-based), refining the ranking of a small candidate set with more expressive cross-attention scoring.
Description
In a multi-stage retrieval pipeline, the first stage (BM25, embedding retrieval, or both) efficiently narrows the full document collection down to a manageable candidate set (typically 10-100 documents). The second stage then applies a more accurate but computationally expensive model to rerank these candidates.
Cross-encoder reranking serves this second-stage role. Unlike bi-encoders that embed queries and documents independently, a cross-encoder takes the concatenated query-document pair as a single input to the transformer. This allows the model to compute full cross-attention between all query tokens and all document tokens, producing a more nuanced relevance judgment.
Key characteristics:
- Joint encoding: The query and document are processed together as a single sequence
[CLS] query [SEP] document [SEP], enabling rich token-level interactions. - Higher accuracy: Full cross-attention captures subtle relevance signals that independent encoding misses, such as negation, entity relationships, and contextual nuance.
- No precomputation: Because the model requires both query and document as input, scores cannot be precomputed. Each (query, document) pair requires a separate forward pass.
- Quadratic cost: Scoring N documents requires N forward passes through the transformer. This makes cross-encoders impractical for searching the full corpus but ideal for reranking a small candidate set.
Theoretical Basis
Bi-Encoder vs. Cross-Encoder
The fundamental tradeoff in neural retrieval is between efficiency and accuracy:
| Property | Bi-Encoder | Cross-Encoder |
|---|---|---|
| Encoding | Query and document encoded independently | Query and document encoded jointly |
| Interaction | Late interaction (similarity of final vectors) | Early interaction (full cross-attention between tokens) |
| Precomputation | Document embeddings can be precomputed | No precomputation possible |
| Scalability | Sublinear with ANN indices | Linear in the number of candidate documents |
| Accuracy | Good | Superior (captures fine-grained relevance) |
The bi-encoder produces a single vector per input, losing token-level information. The cross-encoder preserves all token interactions, enabling it to distinguish cases like:
- "Python is not a compiled language" (negation)
- "The bank of the river" vs. "The financial bank" (disambiguation)
Cross-Encoder Architecture
A cross-encoder is a standard transformer model (typically based on BERT, RoBERTa, or similar) fine-tuned for sequence pair classification:
- Input: The query and document are concatenated with special separator tokens:
[CLS] query_tokens [SEP] document_tokens [SEP]. - Encoding: The full sequence is passed through the transformer, where all tokens attend to all other tokens via self-attention.
- Scoring: The
[CLS]token representation is passed through a linear layer to produce a single scalar relevance score (logit). - Calibration: Optionally, the raw logit is passed through a sigmoid function to produce a probability-like score in the range [0, 1].
Score Calibration
Raw cross-encoder logits can have arbitrary magnitude and are not directly interpretable. The calibration step applies:
score = sigmoid(logit * calibration_factor)
Where:
sigmoid(x) = 1 / (1 + exp(-x))maps the logit to a [0, 1] range.calibration_factorcontrols the sharpness of the sigmoid. A factor of 1.0 uses the standard sigmoid; smaller factors produce scores closer to 0.5 (less confident); larger factors push scores toward 0 or 1.
Two-Stage Retrieval
Cross-encoder reranking is designed for the retrieve-then-rerank pattern:
- Stage 1 (Retrieval): A fast retriever (BM25 or bi-encoder) retrieves the top N candidate documents from the full corpus. This is efficient because BM25 uses inverted indices and bi-encoders use precomputed vectors with ANN search.
- Stage 2 (Reranking): The cross-encoder scores each of the N candidates against the query and reorders them by relevance. Because N is small (typically 10-100), the quadratic cost of cross-encoding is manageable.
This two-stage approach combines the efficiency of first-stage retrieval with the accuracy of cross-encoder scoring.
Usage
Cross-encoder reranking is used as a second stage in retrieval pipelines. A typical pipeline with reranking consists of:
- A retriever (BM25 or embedding-based) that returns an initial candidate set.
- A TransformersSimilarityRanker that reranks the candidates using a cross-encoder model.
from haystack import Document, Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.rankers import TransformersSimilarityRanker
from haystack.document_stores.in_memory import InMemoryDocumentStore
doc_store = InMemoryDocumentStore()
doc_store.write_documents([
Document(content="Berlin is the capital of Germany"),
Document(content="Paris is known for the Eiffel Tower"),
Document(content="Germany is located in central Europe"),
])
pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=doc_store, top_k=10))
pipeline.add_component("ranker", TransformersSimilarityRanker(top_k=3))
pipeline.connect("retriever.documents", "ranker.documents")
result = pipeline.run({
"retriever": {"query": "German capital city"},
"ranker": {"query": "German capital city"},
})
for doc in result["ranker"]["documents"]:
print(doc.content, doc.score)
Related Pages
- Implementation: Deepset_ai_Haystack_TransformersSimilarityRanker -- The concrete Haystack component that implements this principle.
- Related Principle: Deepset_ai_Haystack_BM25_Keyword_Retrieval -- First-stage keyword retrieval often used before reranking.
- Related Principle: Deepset_ai_Haystack_Embedding_Based_Retrieval -- First-stage semantic retrieval often used before reranking.