Implementation:Run llama Llama index PII Postprocessors
| Knowledge Sources | |
|---|---|
| Domains | Postprocessing, Privacy, PII |
| Last Updated | 2026-02-11 19:00 GMT |
Overview
PIINodePostprocessor and NERPIINodePostprocessor are node postprocessors that mask personally identifiable information (PII) in retrieved text nodes using either an LLM or a HuggingFace NER pipeline.
Description
This module provides two complementary approaches to PII masking in retrieval pipelines:
PIINodePostprocessor uses an LLM to identify and mask PII. It sends each node's text along with a few-shot prompt template (DEFAULT_PII_TMPL) that instructs the LLM to replace PII entities (names, credit card numbers, dates, etc.) with tagged placeholders like [NAME1] or [CREDIT_CARD_NUMBER]. The LLM returns both the masked text and a JSON mapping from tags to original values. The postprocessor stores this mapping in the node's metadata under a configurable key (pii_node_info_key) and replaces the node content with the masked version.
NERPIINodePostprocessor uses a HuggingFace Transformers NER (Named Entity Recognition) pipeline to detect PII entities locally without requiring an LLM call. It runs the pipeline("ner", grouped_entities=True) model, replaces detected entities with tags based on their entity group and character position (e.g., [PER_0]), and stores the reverse mapping in node metadata.
Both postprocessors create deep copies of the original nodes to avoid mutating inputs and exclude the PII metadata key from both embed and LLM metadata to prevent information leakage.
Usage
Use PIINodePostprocessor when you need flexible, LLM-powered PII detection that can handle diverse PII types based on prompt engineering. Use NERPIINodePostprocessor when you prefer a local, deterministic NER-based approach without LLM API calls. Both are suitable for compliance scenarios where retrieved context must be sanitized before being sent to downstream LLMs or presented to users.
Code Reference
Source Location
- Repository: Run_llama_Llama_index
- File:
llama-index-core/llama_index/core/postprocessor/pii.py
Signature
class PIINodePostprocessor(BaseNodePostprocessor):
llm: LLM
pii_str_tmpl: str = DEFAULT_PII_TMPL
pii_node_info_key: str = "__pii_node_info__"
class NERPIINodePostprocessor(BaseNodePostprocessor):
pii_node_info_key: str = "__pii_node_info__"
Import
from llama_index.core.postprocessor.pii import PIINodePostprocessor, NERPIINodePostprocessor
I/O Contract
Inputs (PIINodePostprocessor)
| Name | Type | Required | Description |
|---|---|---|---|
| llm | LLM | Yes | The LLM used to identify and mask PII entities. |
| pii_str_tmpl | str | No | Prompt template for PII masking instructions. Defaults to DEFAULT_PII_TMPL with few-shot examples. |
| pii_node_info_key | str | No | Metadata key used to store the PII mapping. Defaults to "__pii_node_info__". |
Inputs (NERPIINodePostprocessor)
| Name | Type | Required | Description |
|---|---|---|---|
| pii_node_info_key | str | No | Metadata key used to store the PII mapping. Defaults to "__pii_node_info__". |
Outputs
| Name | Type | Description |
|---|---|---|
| nodes | List[NodeWithScore] | Deep copies of input nodes with PII-masked text content and a metadata entry containing the original-to-masked PII mapping. |
Usage Examples
from llama_index.core.postprocessor.pii import PIINodePostprocessor, NERPIINodePostprocessor
from llama_index.core.llms import OpenAI
# LLM-based PII masking
llm = OpenAI(model="gpt-4")
pii_processor = PIINodePostprocessor(llm=llm)
query_engine = index.as_query_engine(
node_postprocessors=[pii_processor]
)
response = query_engine.query("Tell me about the customer.")
# NER-based PII masking (requires transformers library)
ner_processor = NERPIINodePostprocessor()
query_engine = index.as_query_engine(
node_postprocessors=[ner_processor]
)
response = query_engine.query("Tell me about the customer.")
Related Pages
- Environment:Run_llama_Llama_index_Python_LlamaIndex_Core
- Run_llama_Llama_index_BaseNodePostprocessor - Parent abstract base class