Implementation:Run llama Llama index SentenceTransformerRerank

Knowledge Sources	Run_llama_Llama_index
Domains	Postprocessing, Reranking, SentenceTransformers
Last Updated	2026-02-11 19:00 GMT

Overview

SentenceTransformerRerank is a node postprocessor that reranks retrieved nodes using a sentence-transformers CrossEncoder model to compute query-passage relevance scores.

Description

SentenceTransformerRerank extends BaseNodePostprocessor and leverages the sentence-transformers library's CrossEncoder class for neural reranking. Unlike bi-encoder approaches that encode query and documents independently, the CrossEncoder takes the query-document pair as a single input and directly outputs a relevance score, which generally yields more accurate relevance judgments.

The postprocessor constructs query-node content pairs and passes them to the CrossEncoder's predict method to obtain relevance scores. Each node's score is updated with the CrossEncoder's prediction, and the nodes are sorted in descending order of score. Only the top top_n nodes are returned.

Key features include:

Configurable model name (defaults to cross-encoder/stsb-distilroberta-base)
Automatic device inference via infer_torch_device() when not explicitly set
keep_retrieval_score option that preserves the original retrieval score in node metadata before overwriting with the reranking score
trust_remote_code flag for loading custom model architectures
Maximum input length of 512 tokens (DEFAULT_SENTENCE_TRANSFORMER_MAX_LENGTH)
Full callback manager integration with CBEventType.RERANKING events

Usage

Use SentenceTransformerRerank when you want fast, local neural reranking without LLM API calls. It is ideal for production pipelines where latency and cost are concerns, as CrossEncoder models are significantly smaller and faster than full LLMs while still providing meaningful relevance improvements over embedding-only retrieval.

Code Reference

Source Location

Repository: Run_llama_Llama_index
File: llama-index-core/llama_index/core/postprocessor/sbert_rerank.py

Signature

class SentenceTransformerRerank(BaseNodePostprocessor):
    def __init__(
        self,
        top_n: int = 2,
        model: str = "cross-encoder/stsb-distilroberta-base",
        device: Optional[str] = None,
        keep_retrieval_score: bool = False,
        trust_remote_code: bool = True,
    ):

Import

from llama_index.core.postprocessor.sbert_rerank import SentenceTransformerRerank

I/O Contract

Inputs

Name	Type	Required	Description
top_n	int	No	Number of top-scored nodes to return after reranking. Defaults to 2.
model	str	No	Sentence transformer CrossEncoder model name. Defaults to "cross-encoder/stsb-distilroberta-base".
device	Optional[str]	No	Device for model inference (e.g. "cpu", "cuda"). Auto-detected if not specified.
keep_retrieval_score	bool	No	If True, stores the original retrieval score in node metadata under "retrieval_score". Defaults to False.
trust_remote_code	bool	No	Whether to trust remote code when loading the model. Defaults to True.

Outputs

Name	Type	Description
nodes	List[NodeWithScore]	Top top_n nodes sorted by CrossEncoder relevance scores in descending order.

Usage Examples

from llama_index.core.postprocessor.sbert_rerank import SentenceTransformerRerank

# Basic usage
reranker = SentenceTransformerRerank(top_n=3)

query_engine = index.as_query_engine(
    node_postprocessors=[reranker]
)
response = query_engine.query("What is machine learning?")

# Custom model with GPU and retrieval score preservation
reranker = SentenceTransformerRerank(
    model="cross-encoder/ms-marco-MiniLM-L-12-v2",
    top_n=5,
    device="cuda",
    keep_retrieval_score=True,
)

Related Pages

Environment:Run_llama_Llama_index_Python_LlamaIndex_Core
Run_llama_Llama_index_BaseNodePostprocessor - Parent abstract base class

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment