Principle:Deepset ai Haystack Cross Encoder Reranking

Metadata

Field	Value
Principle Name	Cross-Encoder Reranking
Domains	Information_Retrieval, NLP
Related Implementation	Deepset_ai_Haystack_TransformersSimilarityRanker
Source Reference	`haystack/components/rankers/transformers_similarity.py:L24-328`
Repository	Deepset_ai_Haystack

Overview

Cross-encoder reranking uses a transformer model that jointly encodes a query-document pair to produce a relevance score, providing higher accuracy than bi-encoder retrieval at the cost of speed. It is typically employed as a second-stage reranker after an initial fast retrieval step (BM25 or embedding-based), refining the ranking of a small candidate set with more expressive cross-attention scoring.

Description

In a multi-stage retrieval pipeline, the first stage (BM25, embedding retrieval, or both) efficiently narrows the full document collection down to a manageable candidate set (typically 10-100 documents). The second stage then applies a more accurate but computationally expensive model to rerank these candidates.

Cross-encoder reranking serves this second-stage role. Unlike bi-encoders that embed queries and documents independently, a cross-encoder takes the concatenated query-document pair as a single input to the transformer. This allows the model to compute full cross-attention between all query tokens and all document tokens, producing a more nuanced relevance judgment.

Key characteristics:

Joint encoding: The query and document are processed together as a single sequence [CLS] query [SEP] document [SEP], enabling rich token-level interactions.
Higher accuracy: Full cross-attention captures subtle relevance signals that independent encoding misses, such as negation, entity relationships, and contextual nuance.
No precomputation: Because the model requires both query and document as input, scores cannot be precomputed. Each (query, document) pair requires a separate forward pass.
Quadratic cost: Scoring N documents requires N forward passes through the transformer. This makes cross-encoders impractical for searching the full corpus but ideal for reranking a small candidate set.

Theoretical Basis

Bi-Encoder vs. Cross-Encoder

The fundamental tradeoff in neural retrieval is between efficiency and accuracy:

Property	Bi-Encoder	Cross-Encoder
Encoding	Query and document encoded independently	Query and document encoded jointly
Interaction	Late interaction (similarity of final vectors)	Early interaction (full cross-attention between tokens)
Precomputation	Document embeddings can be precomputed	No precomputation possible
Scalability	Sublinear with ANN indices	Linear in the number of candidate documents
Accuracy	Good	Superior (captures fine-grained relevance)

The bi-encoder produces a single vector per input, losing token-level information. The cross-encoder preserves all token interactions, enabling it to distinguish cases like:

"Python is not a compiled language" (negation)
"The bank of the river" vs. "The financial bank" (disambiguation)

Cross-Encoder Architecture

A cross-encoder is a standard transformer model (typically based on BERT, RoBERTa, or similar) fine-tuned for sequence pair classification:

Input: The query and document are concatenated with special separator tokens: [CLS] query_tokens [SEP] document_tokens [SEP].
Encoding: The full sequence is passed through the transformer, where all tokens attend to all other tokens via self-attention.
Scoring: The [CLS] token representation is passed through a linear layer to produce a single scalar relevance score (logit).
Calibration: Optionally, the raw logit is passed through a sigmoid function to produce a probability-like score in the range [0, 1].

Score Calibration

Raw cross-encoder logits can have arbitrary magnitude and are not directly interpretable. The calibration step applies:

score = sigmoid(logit * calibration_factor)

Where:

sigmoid(x) = 1 / (1 + exp(-x)) maps the logit to a [0, 1] range.
calibration_factor controls the sharpness of the sigmoid. A factor of 1.0 uses the standard sigmoid; smaller factors produce scores closer to 0.5 (less confident); larger factors push scores toward 0 or 1.

Two-Stage Retrieval

Cross-encoder reranking is designed for the retrieve-then-rerank pattern:

Stage 1 (Retrieval): A fast retriever (BM25 or bi-encoder) retrieves the top N candidate documents from the full corpus. This is efficient because BM25 uses inverted indices and bi-encoders use precomputed vectors with ANN search.
Stage 2 (Reranking): The cross-encoder scores each of the N candidates against the query and reorders them by relevance. Because N is small (typically 10-100), the quadratic cost of cross-encoding is manageable.

This two-stage approach combines the efficiency of first-stage retrieval with the accuracy of cross-encoder scoring.

Usage

Cross-encoder reranking is used as a second stage in retrieval pipelines. A typical pipeline with reranking consists of:

A retriever (BM25 or embedding-based) that returns an initial candidate set.
A TransformersSimilarityRanker that reranks the candidates using a cross-encoder model.

from haystack import Document, Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.rankers import TransformersSimilarityRanker
from haystack.document_stores.in_memory import InMemoryDocumentStore

doc_store = InMemoryDocumentStore()
doc_store.write_documents([
    Document(content="Berlin is the capital of Germany"),
    Document(content="Paris is known for the Eiffel Tower"),
    Document(content="Germany is located in central Europe"),
])

pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=doc_store, top_k=10))
pipeline.add_component("ranker", TransformersSimilarityRanker(top_k=3))
pipeline.connect("retriever.documents", "ranker.documents")

result = pipeline.run({
    "retriever": {"query": "German capital city"},
    "ranker": {"query": "German capital city"},
})
for doc in result["ranker"]["documents"]:
    print(doc.content, doc.score)

Related Pages

Implementation: Deepset_ai_Haystack_TransformersSimilarityRanker -- The concrete Haystack component that implements this principle.
Related Principle: Deepset_ai_Haystack_BM25_Keyword_Retrieval -- First-stage keyword retrieval often used before reranking.
Related Principle: Deepset_ai_Haystack_Embedding_Based_Retrieval -- First-stage semantic retrieval often used before reranking.

Implemented By

Implementation:Deepset_ai_Haystack_TransformersSimilarityRanker

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment