Implementation:Run llama Llama index SentenceTransformerRerank
| Knowledge Sources | |
|---|---|
| Domains | Postprocessing, Reranking, SentenceTransformers |
| Last Updated | 2026-02-11 19:00 GMT |
Overview
SentenceTransformerRerank is a node postprocessor that reranks retrieved nodes using a sentence-transformers CrossEncoder model to compute query-passage relevance scores.
Description
SentenceTransformerRerank extends BaseNodePostprocessor and leverages the sentence-transformers library's CrossEncoder class for neural reranking. Unlike bi-encoder approaches that encode query and documents independently, the CrossEncoder takes the query-document pair as a single input and directly outputs a relevance score, which generally yields more accurate relevance judgments.
The postprocessor constructs query-node content pairs and passes them to the CrossEncoder's predict method to obtain relevance scores. Each node's score is updated with the CrossEncoder's prediction, and the nodes are sorted in descending order of score. Only the top top_n nodes are returned.
Key features include:
- Configurable model name (defaults to
cross-encoder/stsb-distilroberta-base) - Automatic device inference via infer_torch_device() when not explicitly set
- keep_retrieval_score option that preserves the original retrieval score in node metadata before overwriting with the reranking score
- trust_remote_code flag for loading custom model architectures
- Maximum input length of 512 tokens (DEFAULT_SENTENCE_TRANSFORMER_MAX_LENGTH)
- Full callback manager integration with CBEventType.RERANKING events
Usage
Use SentenceTransformerRerank when you want fast, local neural reranking without LLM API calls. It is ideal for production pipelines where latency and cost are concerns, as CrossEncoder models are significantly smaller and faster than full LLMs while still providing meaningful relevance improvements over embedding-only retrieval.
Code Reference
Source Location
- Repository: Run_llama_Llama_index
- File:
llama-index-core/llama_index/core/postprocessor/sbert_rerank.py
Signature
class SentenceTransformerRerank(BaseNodePostprocessor):
def __init__(
self,
top_n: int = 2,
model: str = "cross-encoder/stsb-distilroberta-base",
device: Optional[str] = None,
keep_retrieval_score: bool = False,
trust_remote_code: bool = True,
):
Import
from llama_index.core.postprocessor.sbert_rerank import SentenceTransformerRerank
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| top_n | int | No | Number of top-scored nodes to return after reranking. Defaults to 2. |
| model | str | No | Sentence transformer CrossEncoder model name. Defaults to "cross-encoder/stsb-distilroberta-base". |
| device | Optional[str] | No | Device for model inference (e.g. "cpu", "cuda"). Auto-detected if not specified. |
| keep_retrieval_score | bool | No | If True, stores the original retrieval score in node metadata under "retrieval_score". Defaults to False. |
| trust_remote_code | bool | No | Whether to trust remote code when loading the model. Defaults to True. |
Outputs
| Name | Type | Description |
|---|---|---|
| nodes | List[NodeWithScore] | Top top_n nodes sorted by CrossEncoder relevance scores in descending order. |
Usage Examples
from llama_index.core.postprocessor.sbert_rerank import SentenceTransformerRerank
# Basic usage
reranker = SentenceTransformerRerank(top_n=3)
query_engine = index.as_query_engine(
node_postprocessors=[reranker]
)
response = query_engine.query("What is machine learning?")
# Custom model with GPU and retrieval score preservation
reranker = SentenceTransformerRerank(
model="cross-encoder/ms-marco-MiniLM-L-12-v2",
top_n=5,
device="cuda",
keep_retrieval_score=True,
)
Related Pages
- Environment:Run_llama_Llama_index_Python_LlamaIndex_Core
- Run_llama_Llama_index_BaseNodePostprocessor - Parent abstract base class