Implementation:Deepset ai Haystack TransformersSimilarityRanker
Metadata
| Field | Value |
|---|---|
| Implementation Name | TransformersSimilarityRanker |
| Implementing Principle | Deepset_ai_Haystack_Cross_Encoder_Reranking |
| Class | TransformersSimilarityRanker
|
| Module | haystack.components.rankers.transformers_similarity
|
| Source Reference | haystack/components/rankers/transformers_similarity.py:L24-328
|
| Repository | Deepset_ai_Haystack |
| Dependencies | transformers, torch, accelerate |
Overview
TransformersSimilarityRanker is a Haystack component that ranks documents by their semantic similarity to a query using a cross-encoder transformer model. It jointly encodes each (query, document) pair and produces a relevance score via a classification head, then returns the documents sorted by descending relevance. This component is designed as a second-stage reranker in retrieval pipelines.
Description
The component loads a cross-encoder model (by default cross-encoder/ms-marco-MiniLM-L-6-v2) using the Hugging Face transformers library. For each query, it constructs (query, document) pairs, tokenizes them, and performs batch inference to produce raw logit scores. These logits can optionally be scaled through a sigmoid function with a configurable calibration factor.
Key behaviors:
- Lazy initialization: The model and tokenizer are loaded on the first call to
warm_up()or automatically on the firstrun(). - Deduplication: Before ranking, input documents are deduplicated by their
idfield. If duplicates exist, the one with the highest pre-existing score is retained. - Meta field embedding: Metadata fields specified in
meta_fields_to_embedare concatenated with the document content (separated byembedding_separator) before forming the (query, document) pair. - Query and document prefixes: Configurable
query_prefixanddocument_prefixstrings are prepended to the query and document text, respectively, supporting models like BGE that require instruction prefixes. - Score calibration: When
scale_score=True, raw logits are passed throughsigmoid(logit * calibration_factor)to produce scores in the [0, 1] range. - Score threshold filtering: Documents below a configurable
score_thresholdare excluded from the output. - Batch inference: Documents are processed in batches of configurable size using a PyTorch
DataLoader, with inference performed undertorch.inference_mode()for efficiency. - Device map support: Uses the
acceleratelibrary for Hugging Face device map resolution.
Note: This component is considered legacy by the Haystack maintainers. SentenceTransformersSimilarityRanker is recommended as the replacement, providing the same functionality with additional features.
Code Reference
Import
from haystack.components.rankers import TransformersSimilarityRanker
Constructor Signature
TransformersSimilarityRanker(
model: str | Path = "cross-encoder/ms-marco-MiniLM-L-6-v2",
device: ComponentDevice | None = None,
token: Secret | None = Secret.from_env_var(["HF_API_TOKEN", "HF_TOKEN"], strict=False),
top_k: int = 10,
query_prefix: str = "",
document_prefix: str = "",
meta_fields_to_embed: list[str] | None = None,
embedding_separator: str = "\n",
scale_score: bool = True,
calibration_factor: float | None = 1.0,
score_threshold: float | None = None,
model_kwargs: dict[str, Any] | None = None,
tokenizer_kwargs: dict[str, Any] | None = None,
batch_size: int = 16,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
Path | "cross-encoder/ms-marco-MiniLM-L-6-v2" |
Hugging Face model ID or local path for the cross-encoder model. |
device |
None | None |
Device for model loading. Resolved via accelerate device map. |
token |
None | env var | API token for private Hugging Face models. |
top_k |
int |
10 |
Maximum number of documents to return. |
query_prefix |
str |
"" |
String prepended to the query before forming pairs. |
document_prefix |
str |
"" |
String prepended to each document text before forming pairs. |
meta_fields_to_embed |
None | None |
Metadata fields to concatenate with document content. |
embedding_separator |
str |
"\n" |
Separator between metadata fields and document content. |
scale_score |
bool |
True |
If True, apply sigmoid calibration to raw logits. |
calibration_factor |
None | 1.0 |
Factor for sigmoid calibration: sigmoid(logit * factor). Required when scale_score=True.
|
score_threshold |
None | None |
Minimum score for a document to be included in the output. |
model_kwargs |
None | None |
Additional kwargs for AutoModelForSequenceClassification.from_pretrained.
|
tokenizer_kwargs |
None | None |
Additional kwargs for AutoTokenizer.from_pretrained.
|
batch_size |
int |
16 |
Batch size for inference. Reduce if encountering memory issues. |
I/O Contract
Input
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
str |
Yes | The query text to compare documents against. |
documents |
list[Document] |
Yes | The candidate documents to rank. |
top_k |
None | No | Override the default maximum number of documents to return. |
scale_score |
None | No | Override the default score scaling behavior. |
calibration_factor |
None | No | Override the default calibration factor. |
score_threshold |
None | No | Override the default score threshold. |
Output
| Key | Type | Description |
|---|---|---|
documents |
list[Document] |
Documents sorted by cross-encoder relevance score, from most to least relevant. |
The output dictionary has the structure:
{"documents": list[Document]}
Each returned Document has its score field populated with the cross-encoder relevance score. When scale_score=True, scores are in the [0, 1] range. When scale_score=False, scores are raw logits.
Usage Examples
Basic Reranking
from haystack import Document
from haystack.components.rankers import TransformersSimilarityRanker
ranker = TransformersSimilarityRanker()
ranker.warm_up()
docs = [Document(content="Paris"), Document(content="Berlin")]
result = ranker.run(query="City in Germany", documents=docs)
for doc in result["documents"]:
print(f"{doc.content}: {doc.score:.4f}")
# Berlin: 0.9997
# Paris: 0.0012
Reranking after BM25 Retrieval
from haystack import Document, Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.rankers import TransformersSimilarityRanker
from haystack.document_stores.in_memory import InMemoryDocumentStore
doc_store = InMemoryDocumentStore()
doc_store.write_documents([
Document(content="Berlin is the capital of Germany"),
Document(content="Paris is known for the Eiffel Tower"),
Document(content="Germany is a country in central Europe"),
Document(content="The capital of France is Paris"),
])
pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=doc_store, top_k=10))
pipeline.add_component("ranker", TransformersSimilarityRanker(top_k=3, scale_score=True))
pipeline.connect("retriever.documents", "ranker.documents")
result = pipeline.run({
"retriever": {"query": "What is the capital of Germany?"},
"ranker": {"query": "What is the capital of Germany?"},
})
for doc in result["ranker"]["documents"]:
print(f"{doc.content} (score: {doc.score:.4f})")
Reranking with Score Threshold
from haystack import Document
from haystack.components.rankers import TransformersSimilarityRanker
ranker = TransformersSimilarityRanker(
model="cross-encoder/ms-marco-MiniLM-L-6-v2",
top_k=10,
scale_score=True,
calibration_factor=1.0,
score_threshold=0.5,
batch_size=32,
)
ranker.warm_up()
docs = [
Document(content="Haystack is an open-source NLP framework"),
Document(content="The weather is sunny today"),
Document(content="Building search pipelines with Haystack"),
]
result = ranker.run(query="How to build NLP applications?", documents=docs)
# Only documents with score >= 0.5 are returned
for doc in result["documents"]:
print(f"{doc.content} (score: {doc.score:.4f})")
Related Pages
- Principle: Deepset_ai_Haystack_Cross_Encoder_Reranking -- The principle that this component implements.
- Related Implementation: Deepset_ai_Haystack_InMemoryBM25Retriever -- BM25 retriever often used as the first stage before reranking.
- Related Implementation: Deepset_ai_Haystack_InMemoryEmbeddingRetriever -- Embedding retriever often used as the first stage before reranking.
Implements Principle
Requires Environment
- Environment:Deepset_ai_Haystack_HuggingFace_Model_Environment
- Environment:Deepset_ai_Haystack_GPU_Device_Environment