Implementation:Run llama Llama index SentenceEmbeddingOptimizer

Knowledge Sources	Run_llama_Llama_index
Domains	Postprocessing, Embeddings, Optimization
Last Updated	2026-02-11 19:00 GMT

Overview

SentenceEmbeddingOptimizer is a node postprocessor that shortens retrieved text chunks by selecting only the most relevant sentences based on embedding similarity to the query.

Description

SentenceEmbeddingOptimizer extends BaseNodePostprocessor to optimize node content by breaking text into sentences, computing embedding similarity between each sentence and the query, and retaining only the top-scoring sentences. It supports two filtering modes: a percentile_cutoff that keeps the top N% of sentences, and a threshold_cutoff that keeps only sentences above a specified similarity score. Both cutoffs can be combined. The class also supports configurable context_before and context_after parameters that include surrounding sentences for additional context around each selected sentence. By default, it uses the configured Settings.embed_model or falls back to OpenAIEmbedding. Sentence tokenization is performed via the NLTK punkt tokenizer by default but can be overridden with a custom tokenizer function.

Usage

Use SentenceEmbeddingOptimizer when retrieved nodes contain long text passages and you want to reduce token consumption by sending only the most query-relevant sentences to the LLM. This is particularly useful in retrieval-augmented generation (RAG) pipelines where context window limits are a concern or where reducing noise in the context improves answer quality.

Code Reference

Source Location

Repository: Run_llama_Llama_index
File: llama-index-core/llama_index/core/postprocessor/optimizer.py

Signature

class SentenceEmbeddingOptimizer(BaseNodePostprocessor):
    def __init__(
        self,
        embed_model: Optional[BaseEmbedding] = None,
        percentile_cutoff: Optional[float] = None,
        threshold_cutoff: Optional[float] = None,
        tokenizer_fn: Optional[Callable[[str], List[str]]] = None,
        context_before: Optional[int] = None,
        context_after: Optional[int] = None,
    ):

Import

from llama_index.core.postprocessor.optimizer import SentenceEmbeddingOptimizer

I/O Contract

Inputs

Name	Type	Required	Description
embed_model	BaseEmbedding	No	Embedding model for computing sentence similarities. Defaults to Settings.embed_model or OpenAIEmbedding.
percentile_cutoff	Optional[float]	No	Percentile cutoff (0.0-1.0) for the top k sentences to retain (e.g. 0.5 keeps top 50%).
threshold_cutoff	Optional[float]	No	Minimum similarity score threshold for a sentence to be retained.
tokenizer_fn	Optional[Callable[[str], List[str]]]	No	Custom tokenizer function to split text into sentences. Defaults to NLTK punkt tokenizer.
context_before	Optional[int]	No	Number of sentences before a selected sentence to include for additional context. Defaults to 1.
context_after	Optional[int]	No	Number of sentences after a selected sentence to include for additional context. Defaults to 1.

Outputs

Name	Type	Description
nodes	List[NodeWithScore]	The input nodes with their text content replaced by only the most relevant sentences based on embedding similarity.

Usage Examples

from llama_index.core.postprocessor.optimizer import SentenceEmbeddingOptimizer

# Keep top 50% of sentences by embedding similarity
optimizer = SentenceEmbeddingOptimizer(percentile_cutoff=0.5)

# Use with a query engine
query_engine = index.as_query_engine(
    node_postprocessors=[optimizer]
)
response = query_engine.query("What is the capital of France?")

# Alternatively, use a similarity threshold
optimizer = SentenceEmbeddingOptimizer(
    threshold_cutoff=0.7,
    context_before=2,
    context_after=2,
)

Related Pages

Environment:Run_llama_Llama_index_Python_LlamaIndex_Core
Run_llama_Llama_index_BaseNodePostprocessor - Parent abstract base class

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment