Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepset ai Haystack TransformersSimilarityRanker

From Leeroopedia

Metadata

Field Value
Implementation Name TransformersSimilarityRanker
Implementing Principle Deepset_ai_Haystack_Cross_Encoder_Reranking
Class TransformersSimilarityRanker
Module haystack.components.rankers.transformers_similarity
Source Reference haystack/components/rankers/transformers_similarity.py:L24-328
Repository Deepset_ai_Haystack
Dependencies transformers, torch, accelerate

Overview

TransformersSimilarityRanker is a Haystack component that ranks documents by their semantic similarity to a query using a cross-encoder transformer model. It jointly encodes each (query, document) pair and produces a relevance score via a classification head, then returns the documents sorted by descending relevance. This component is designed as a second-stage reranker in retrieval pipelines.

Description

The component loads a cross-encoder model (by default cross-encoder/ms-marco-MiniLM-L-6-v2) using the Hugging Face transformers library. For each query, it constructs (query, document) pairs, tokenizes them, and performs batch inference to produce raw logit scores. These logits can optionally be scaled through a sigmoid function with a configurable calibration factor.

Key behaviors:

  • Lazy initialization: The model and tokenizer are loaded on the first call to warm_up() or automatically on the first run().
  • Deduplication: Before ranking, input documents are deduplicated by their id field. If duplicates exist, the one with the highest pre-existing score is retained.
  • Meta field embedding: Metadata fields specified in meta_fields_to_embed are concatenated with the document content (separated by embedding_separator) before forming the (query, document) pair.
  • Query and document prefixes: Configurable query_prefix and document_prefix strings are prepended to the query and document text, respectively, supporting models like BGE that require instruction prefixes.
  • Score calibration: When scale_score=True, raw logits are passed through sigmoid(logit * calibration_factor) to produce scores in the [0, 1] range.
  • Score threshold filtering: Documents below a configurable score_threshold are excluded from the output.
  • Batch inference: Documents are processed in batches of configurable size using a PyTorch DataLoader, with inference performed under torch.inference_mode() for efficiency.
  • Device map support: Uses the accelerate library for Hugging Face device map resolution.

Note: This component is considered legacy by the Haystack maintainers. SentenceTransformersSimilarityRanker is recommended as the replacement, providing the same functionality with additional features.

Code Reference

Import

from haystack.components.rankers import TransformersSimilarityRanker

Constructor Signature

TransformersSimilarityRanker(
    model: str | Path = "cross-encoder/ms-marco-MiniLM-L-6-v2",
    device: ComponentDevice | None = None,
    token: Secret | None = Secret.from_env_var(["HF_API_TOKEN", "HF_TOKEN"], strict=False),
    top_k: int = 10,
    query_prefix: str = "",
    document_prefix: str = "",
    meta_fields_to_embed: list[str] | None = None,
    embedding_separator: str = "\n",
    scale_score: bool = True,
    calibration_factor: float | None = 1.0,
    score_threshold: float | None = None,
    model_kwargs: dict[str, Any] | None = None,
    tokenizer_kwargs: dict[str, Any] | None = None,
    batch_size: int = 16,
)
Parameter Type Default Description
model Path "cross-encoder/ms-marco-MiniLM-L-6-v2" Hugging Face model ID or local path for the cross-encoder model.
device None None Device for model loading. Resolved via accelerate device map.
token None env var API token for private Hugging Face models.
top_k int 10 Maximum number of documents to return.
query_prefix str "" String prepended to the query before forming pairs.
document_prefix str "" String prepended to each document text before forming pairs.
meta_fields_to_embed None None Metadata fields to concatenate with document content.
embedding_separator str "\n" Separator between metadata fields and document content.
scale_score bool True If True, apply sigmoid calibration to raw logits.
calibration_factor None 1.0 Factor for sigmoid calibration: sigmoid(logit * factor). Required when scale_score=True.
score_threshold None None Minimum score for a document to be included in the output.
model_kwargs None None Additional kwargs for AutoModelForSequenceClassification.from_pretrained.
tokenizer_kwargs None None Additional kwargs for AutoTokenizer.from_pretrained.
batch_size int 16 Batch size for inference. Reduce if encountering memory issues.

I/O Contract

Input

Parameter Type Required Description
query str Yes The query text to compare documents against.
documents list[Document] Yes The candidate documents to rank.
top_k None No Override the default maximum number of documents to return.
scale_score None No Override the default score scaling behavior.
calibration_factor None No Override the default calibration factor.
score_threshold None No Override the default score threshold.

Output

Key Type Description
documents list[Document] Documents sorted by cross-encoder relevance score, from most to least relevant.

The output dictionary has the structure:

{"documents": list[Document]}

Each returned Document has its score field populated with the cross-encoder relevance score. When scale_score=True, scores are in the [0, 1] range. When scale_score=False, scores are raw logits.

Usage Examples

Basic Reranking

from haystack import Document
from haystack.components.rankers import TransformersSimilarityRanker

ranker = TransformersSimilarityRanker()
ranker.warm_up()

docs = [Document(content="Paris"), Document(content="Berlin")]
result = ranker.run(query="City in Germany", documents=docs)

for doc in result["documents"]:
    print(f"{doc.content}: {doc.score:.4f}")
# Berlin: 0.9997
# Paris: 0.0012

Reranking after BM25 Retrieval

from haystack import Document, Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.rankers import TransformersSimilarityRanker
from haystack.document_stores.in_memory import InMemoryDocumentStore

doc_store = InMemoryDocumentStore()
doc_store.write_documents([
    Document(content="Berlin is the capital of Germany"),
    Document(content="Paris is known for the Eiffel Tower"),
    Document(content="Germany is a country in central Europe"),
    Document(content="The capital of France is Paris"),
])

pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store=doc_store, top_k=10))
pipeline.add_component("ranker", TransformersSimilarityRanker(top_k=3, scale_score=True))
pipeline.connect("retriever.documents", "ranker.documents")

result = pipeline.run({
    "retriever": {"query": "What is the capital of Germany?"},
    "ranker": {"query": "What is the capital of Germany?"},
})
for doc in result["ranker"]["documents"]:
    print(f"{doc.content} (score: {doc.score:.4f})")

Reranking with Score Threshold

from haystack import Document
from haystack.components.rankers import TransformersSimilarityRanker

ranker = TransformersSimilarityRanker(
    model="cross-encoder/ms-marco-MiniLM-L-6-v2",
    top_k=10,
    scale_score=True,
    calibration_factor=1.0,
    score_threshold=0.5,
    batch_size=32,
)
ranker.warm_up()

docs = [
    Document(content="Haystack is an open-source NLP framework"),
    Document(content="The weather is sunny today"),
    Document(content="Building search pipelines with Haystack"),
]

result = ranker.run(query="How to build NLP applications?", documents=docs)
# Only documents with score >= 0.5 are returned
for doc in result["documents"]:
    print(f"{doc.content} (score: {doc.score:.4f})")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment