Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Run llama Llama index SentenceEmbeddingOptimizer

From Leeroopedia
Knowledge Sources
Domains Postprocessing, Embeddings, Optimization
Last Updated 2026-02-11 19:00 GMT

Overview

SentenceEmbeddingOptimizer is a node postprocessor that shortens retrieved text chunks by selecting only the most relevant sentences based on embedding similarity to the query.

Description

SentenceEmbeddingOptimizer extends BaseNodePostprocessor to optimize node content by breaking text into sentences, computing embedding similarity between each sentence and the query, and retaining only the top-scoring sentences. It supports two filtering modes: a percentile_cutoff that keeps the top N% of sentences, and a threshold_cutoff that keeps only sentences above a specified similarity score. Both cutoffs can be combined. The class also supports configurable context_before and context_after parameters that include surrounding sentences for additional context around each selected sentence. By default, it uses the configured Settings.embed_model or falls back to OpenAIEmbedding. Sentence tokenization is performed via the NLTK punkt tokenizer by default but can be overridden with a custom tokenizer function.

Usage

Use SentenceEmbeddingOptimizer when retrieved nodes contain long text passages and you want to reduce token consumption by sending only the most query-relevant sentences to the LLM. This is particularly useful in retrieval-augmented generation (RAG) pipelines where context window limits are a concern or where reducing noise in the context improves answer quality.

Code Reference

Source Location

Signature

class SentenceEmbeddingOptimizer(BaseNodePostprocessor):
    def __init__(
        self,
        embed_model: Optional[BaseEmbedding] = None,
        percentile_cutoff: Optional[float] = None,
        threshold_cutoff: Optional[float] = None,
        tokenizer_fn: Optional[Callable[[str], List[str]]] = None,
        context_before: Optional[int] = None,
        context_after: Optional[int] = None,
    ):

Import

from llama_index.core.postprocessor.optimizer import SentenceEmbeddingOptimizer

I/O Contract

Inputs

Name Type Required Description
embed_model BaseEmbedding No Embedding model for computing sentence similarities. Defaults to Settings.embed_model or OpenAIEmbedding.
percentile_cutoff Optional[float] No Percentile cutoff (0.0-1.0) for the top k sentences to retain (e.g. 0.5 keeps top 50%).
threshold_cutoff Optional[float] No Minimum similarity score threshold for a sentence to be retained.
tokenizer_fn Optional[Callable[[str], List[str]]] No Custom tokenizer function to split text into sentences. Defaults to NLTK punkt tokenizer.
context_before Optional[int] No Number of sentences before a selected sentence to include for additional context. Defaults to 1.
context_after Optional[int] No Number of sentences after a selected sentence to include for additional context. Defaults to 1.

Outputs

Name Type Description
nodes List[NodeWithScore] The input nodes with their text content replaced by only the most relevant sentences based on embedding similarity.

Usage Examples

from llama_index.core.postprocessor.optimizer import SentenceEmbeddingOptimizer

# Keep top 50% of sentences by embedding similarity
optimizer = SentenceEmbeddingOptimizer(percentile_cutoff=0.5)

# Use with a query engine
query_engine = index.as_query_engine(
    node_postprocessors=[optimizer]
)
response = query_engine.query("What is the capital of France?")

# Alternatively, use a similarity threshold
optimizer = SentenceEmbeddingOptimizer(
    threshold_cutoff=0.7,
    context_before=2,
    context_after=2,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment