Implementation:Run llama Llama index SentenceEmbeddingOptimizer
| Knowledge Sources | |
|---|---|
| Domains | Postprocessing, Embeddings, Optimization |
| Last Updated | 2026-02-11 19:00 GMT |
Overview
SentenceEmbeddingOptimizer is a node postprocessor that shortens retrieved text chunks by selecting only the most relevant sentences based on embedding similarity to the query.
Description
SentenceEmbeddingOptimizer extends BaseNodePostprocessor to optimize node content by breaking text into sentences, computing embedding similarity between each sentence and the query, and retaining only the top-scoring sentences. It supports two filtering modes: a percentile_cutoff that keeps the top N% of sentences, and a threshold_cutoff that keeps only sentences above a specified similarity score. Both cutoffs can be combined. The class also supports configurable context_before and context_after parameters that include surrounding sentences for additional context around each selected sentence. By default, it uses the configured Settings.embed_model or falls back to OpenAIEmbedding. Sentence tokenization is performed via the NLTK punkt tokenizer by default but can be overridden with a custom tokenizer function.
Usage
Use SentenceEmbeddingOptimizer when retrieved nodes contain long text passages and you want to reduce token consumption by sending only the most query-relevant sentences to the LLM. This is particularly useful in retrieval-augmented generation (RAG) pipelines where context window limits are a concern or where reducing noise in the context improves answer quality.
Code Reference
Source Location
- Repository: Run_llama_Llama_index
- File:
llama-index-core/llama_index/core/postprocessor/optimizer.py
Signature
class SentenceEmbeddingOptimizer(BaseNodePostprocessor):
def __init__(
self,
embed_model: Optional[BaseEmbedding] = None,
percentile_cutoff: Optional[float] = None,
threshold_cutoff: Optional[float] = None,
tokenizer_fn: Optional[Callable[[str], List[str]]] = None,
context_before: Optional[int] = None,
context_after: Optional[int] = None,
):
Import
from llama_index.core.postprocessor.optimizer import SentenceEmbeddingOptimizer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| embed_model | BaseEmbedding | No | Embedding model for computing sentence similarities. Defaults to Settings.embed_model or OpenAIEmbedding. |
| percentile_cutoff | Optional[float] | No | Percentile cutoff (0.0-1.0) for the top k sentences to retain (e.g. 0.5 keeps top 50%). |
| threshold_cutoff | Optional[float] | No | Minimum similarity score threshold for a sentence to be retained. |
| tokenizer_fn | Optional[Callable[[str], List[str]]] | No | Custom tokenizer function to split text into sentences. Defaults to NLTK punkt tokenizer. |
| context_before | Optional[int] | No | Number of sentences before a selected sentence to include for additional context. Defaults to 1. |
| context_after | Optional[int] | No | Number of sentences after a selected sentence to include for additional context. Defaults to 1. |
Outputs
| Name | Type | Description |
|---|---|---|
| nodes | List[NodeWithScore] | The input nodes with their text content replaced by only the most relevant sentences based on embedding similarity. |
Usage Examples
from llama_index.core.postprocessor.optimizer import SentenceEmbeddingOptimizer
# Keep top 50% of sentences by embedding similarity
optimizer = SentenceEmbeddingOptimizer(percentile_cutoff=0.5)
# Use with a query engine
query_engine = index.as_query_engine(
node_postprocessors=[optimizer]
)
response = query_engine.query("What is the capital of France?")
# Alternatively, use a similarity threshold
optimizer = SentenceEmbeddingOptimizer(
threshold_cutoff=0.7,
context_before=2,
context_after=2,
)
Related Pages
- Environment:Run_llama_Llama_index_Python_LlamaIndex_Core
- Run_llama_Llama_index_BaseNodePostprocessor - Parent abstract base class