Overview
Provides core node postprocessor classes that filter, reorder, and augment retrieved nodes based on similarity scores, keywords, document relationships, and context positioning strategies.
Description
The node.py module defines several BaseNodePostprocessor subclasses used to refine retrieval results after an initial retrieval step:
- KeywordNodePostprocessor filters nodes based on required and excluded keywords. It uses spaCy's PhraseMatcher to match keywords against node content. Nodes that lack any required keyword or contain any excluded keyword are removed from the result set. The lang parameter controls the spaCy language model used for tokenization.
- SimilarityPostprocessor filters nodes by a similarity_cutoff threshold. Nodes with a score below the cutoff (or with no score at all) are excluded from the results.
- PrevNextNodePostprocessor expands the retrieved node set by traversing document relationships (NEXT and PREVIOUS) in the document store. It supports three modes: "next" (forward traversal only), "previous" (backward traversal only), and "both" (both directions). After expansion, nodes are sorted by their relationship ordering.
- AutoPrevNextNodePostprocessor automatically infers whether to fetch previous or next context by using an LLM to predict the appropriate traversal direction. It employs a response synthesizer with configurable prompt templates (infer_prev_next_tmpl and refine_prev_next_tmpl) to determine whether the answer lies in prior context, future context, or neither.
- LongContextReorder reorders nodes based on the research finding (from Liu et al., 2023) that LLMs perform best when important information is at the beginning or end of the context. It interleaves nodes by placing even-indexed nodes (by ascending score) at the front and odd-indexed nodes at the back.
The module also provides two helper functions, get_forward_nodes and get_backward_nodes, which perform iterative traversal of node relationships in a BaseDocumentStore.
Usage
Use these postprocessors in a retrieval pipeline to improve the quality and relevance of retrieved context before it is passed to a response synthesis step. SimilarityPostprocessor is the most commonly used for basic score filtering. KeywordNodePostprocessor is useful for enforcing domain-specific term requirements. PrevNextNodePostprocessor and AutoPrevNextNodePostprocessor are valuable when documents have sequential structure. LongContextReorder helps maximize LLM performance on long retrieved contexts.
Code Reference
Source Location
- Repository: Run_llama_Llama_index
- File: llama-index-core/llama_index/core/postprocessor/node.py
- Lines: 1-396
Signature
class KeywordNodePostprocessor(BaseNodePostprocessor):
required_keywords: List[str] = Field(default_factory=list)
exclude_keywords: List[str] = Field(default_factory=list)
lang: str = Field(default="en")
def _postprocess_nodes(
self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] = None,
) -> List[NodeWithScore]: ...
class SimilarityPostprocessor(BaseNodePostprocessor):
similarity_cutoff: float = Field(default=0.0)
def _postprocess_nodes(
self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] = None,
) -> List[NodeWithScore]: ...
class PrevNextNodePostprocessor(BaseNodePostprocessor):
docstore: BaseDocumentStore
num_nodes: int = Field(default=1)
mode: str = Field(default="next")
def _postprocess_nodes(
self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] = None,
) -> List[NodeWithScore]: ...
class AutoPrevNextNodePostprocessor(BaseNodePostprocessor):
docstore: BaseDocumentStore
llm: Optional[SerializeAsAny[LLM]] = None
num_nodes: int = Field(default=1)
infer_prev_next_tmpl: str
refine_prev_next_tmpl: str
verbose: bool = Field(default=False)
response_mode: ResponseMode = Field(default=ResponseMode.COMPACT)
def _postprocess_nodes(
self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] = None,
) -> List[NodeWithScore]: ...
class LongContextReorder(BaseNodePostprocessor):
def _postprocess_nodes(
self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle] = None,
) -> List[NodeWithScore]: ...
Import
from llama_index.core.postprocessor.node import SimilarityPostprocessor
from llama_index.core.postprocessor.node import KeywordNodePostprocessor
from llama_index.core.postprocessor.node import PrevNextNodePostprocessor
from llama_index.core.postprocessor.node import AutoPrevNextNodePostprocessor
from llama_index.core.postprocessor.node import LongContextReorder
I/O Contract
Inputs (SimilarityPostprocessor)
| Name |
Type |
Required |
Description
|
| similarity_cutoff |
float |
No |
Minimum similarity score threshold (default: 0.0); nodes scoring below this are filtered out
|
Inputs (KeywordNodePostprocessor)
| Name |
Type |
Required |
Description
|
| required_keywords |
List[str] |
No |
Keywords that must appear in the node content for it to be kept
|
| exclude_keywords |
List[str] |
No |
Keywords that cause a node to be excluded if present
|
| lang |
str |
No |
SpaCy language code for tokenization (default: "en")
|
Inputs (PrevNextNodePostprocessor)
| Name |
Type |
Required |
Description
|
| docstore |
BaseDocumentStore |
Yes |
Document store for retrieving related nodes
|
| num_nodes |
int |
No |
Number of adjacent nodes to fetch in each direction (default: 1)
|
| mode |
str |
No |
Direction of traversal: "next", "previous", or "both" (default: "next")
|
Inputs (AutoPrevNextNodePostprocessor)
| Name |
Type |
Required |
Description
|
| docstore |
BaseDocumentStore |
Yes |
Document store for retrieving related nodes
|
| llm |
Optional[LLM] |
No |
LLM to use for direction inference; defaults to Settings.llm
|
| num_nodes |
int |
No |
Number of adjacent nodes to fetch (default: 1)
|
| infer_prev_next_tmpl |
str |
No |
Prompt template for inferring traversal direction
|
| refine_prev_next_tmpl |
str |
No |
Prompt template for refining the inference
|
| verbose |
bool |
No |
Whether to print debug information (default: False)
|
| response_mode |
ResponseMode |
No |
Response synthesis mode (default: COMPACT)
|
Outputs
| Name |
Type |
Description
|
| _postprocess_nodes() |
List[NodeWithScore] |
Filtered, expanded, or reordered list of nodes with scores
|
Usage Examples
Basic Usage with SimilarityPostprocessor
from llama_index.core.postprocessor.node import SimilarityPostprocessor
postprocessor = SimilarityPostprocessor(similarity_cutoff=0.7)
# Use in a query engine pipeline
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
node_postprocessors=[postprocessor]
)
response = query_engine.query("What is the topic?")
KeywordNodePostprocessor
from llama_index.core.postprocessor.node import KeywordNodePostprocessor
postprocessor = KeywordNodePostprocessor(
required_keywords=["machine learning"],
exclude_keywords=["deprecated"],
)
LongContextReorder
from llama_index.core.postprocessor.node import LongContextReorder
reorder = LongContextReorder()
# Reorders nodes so highest-relevance items are at the beginning and end
Related Pages