Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index SentenceTransformerRerank

From Leeroopedia
Knowledge Sources
Domains Postprocessing, Reranking, SentenceTransformers
Last Updated 2026-02-11 19:00 GMT

Overview

SentenceTransformerRerank is a node postprocessor that reranks retrieved nodes using a sentence-transformers CrossEncoder model to compute query-passage relevance scores.

Description

SentenceTransformerRerank extends BaseNodePostprocessor and leverages the sentence-transformers library's CrossEncoder class for neural reranking. Unlike bi-encoder approaches that encode query and documents independently, the CrossEncoder takes the query-document pair as a single input and directly outputs a relevance score, which generally yields more accurate relevance judgments.

The postprocessor constructs query-node content pairs and passes them to the CrossEncoder's predict method to obtain relevance scores. Each node's score is updated with the CrossEncoder's prediction, and the nodes are sorted in descending order of score. Only the top top_n nodes are returned.

Key features include:

  • Configurable model name (defaults to cross-encoder/stsb-distilroberta-base)
  • Automatic device inference via infer_torch_device() when not explicitly set
  • keep_retrieval_score option that preserves the original retrieval score in node metadata before overwriting with the reranking score
  • trust_remote_code flag for loading custom model architectures
  • Maximum input length of 512 tokens (DEFAULT_SENTENCE_TRANSFORMER_MAX_LENGTH)
  • Full callback manager integration with CBEventType.RERANKING events

Usage

Use SentenceTransformerRerank when you want fast, local neural reranking without LLM API calls. It is ideal for production pipelines where latency and cost are concerns, as CrossEncoder models are significantly smaller and faster than full LLMs while still providing meaningful relevance improvements over embedding-only retrieval.

Code Reference

Source Location

Signature

class SentenceTransformerRerank(BaseNodePostprocessor):
    def __init__(
        self,
        top_n: int = 2,
        model: str = "cross-encoder/stsb-distilroberta-base",
        device: Optional[str] = None,
        keep_retrieval_score: bool = False,
        trust_remote_code: bool = True,
    ):

Import

from llama_index.core.postprocessor.sbert_rerank import SentenceTransformerRerank

I/O Contract

Inputs

Name Type Required Description
top_n int No Number of top-scored nodes to return after reranking. Defaults to 2.
model str No Sentence transformer CrossEncoder model name. Defaults to "cross-encoder/stsb-distilroberta-base".
device Optional[str] No Device for model inference (e.g. "cpu", "cuda"). Auto-detected if not specified.
keep_retrieval_score bool No If True, stores the original retrieval score in node metadata under "retrieval_score". Defaults to False.
trust_remote_code bool No Whether to trust remote code when loading the model. Defaults to True.

Outputs

Name Type Description
nodes List[NodeWithScore] Top top_n nodes sorted by CrossEncoder relevance scores in descending order.

Usage Examples

from llama_index.core.postprocessor.sbert_rerank import SentenceTransformerRerank

# Basic usage
reranker = SentenceTransformerRerank(top_n=3)

query_engine = index.as_query_engine(
    node_postprocessors=[reranker]
)
response = query_engine.query("What is machine learning?")

# Custom model with GPU and retrieval score preservation
reranker = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-12-v2",
    top_n=5,
    device="cuda",
    keep_retrieval_score=True,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment