Implementation:Protectai Llm guard Anonymize

Knowledge Sources	LLM Guard Presidio Documentation LLM Guard Documentation
Domains	NLP, Data_Privacy, Named_Entity_Recognition
Last Updated	2026-02-14 12:00 GMT

Overview

Concrete tool for detecting and anonymizing PII in text using NER models, regex patterns, and Presidio integration, provided by the LLM Guard library.

Description

The Anonymize class is an input scanner that detects personally identifiable information using a combination of transformer-based NER (default: DeBERTa Ai4Privacy v2), regex patterns, and Presidio's AnalyzerEngine. Detected entities are replaced with indexed placeholders like [REDACTED_PERSON_1] and stored in a shared Vault instance for later deanonymization.

The scanner supports:

12+ default entity types (CREDIT_CARD, EMAIL_ADDRESS, PERSON, PHONE_NUMBER, US_SSN, etc.)
Custom hidden names for forced anonymization
Allowed names for exemption from anonymization
Faker-based replacement for realistic pseudonymization
English and Chinese language support
ONNX runtime for optimized inference

Usage

Import this scanner when user prompts may contain PII that must be removed before sending to an LLM. Always pair with a Deanonymize output scanner using the same Vault instance for reversible anonymization.

Code Reference

Source Location

Repository: llm-guard
File: llm_guard/input_scanners/anonymize.py
Lines: L46-396

Signature

class Anonymize(Scanner):
    def __init__(
        self,
        vault: Vault,
        *,
        hidden_names: list[str] | None = None,
        allowed_names: list[str] | None = None,
        entity_types: list[str] | None = None,
        preamble: str = "",
        regex_patterns: list[DefaultRegexPatterns | RegexPatternsReuse] | None = None,
        use_faker: bool = False,
        recognizer_conf: NERConfig | None = None,
        threshold: float = 0.5,
        use_onnx: bool = False,
        language: str = "en",
    ) -> None:
        """
        Args:
            vault: Vault instance to store anonymized mappings.
            hidden_names: Names to always anonymize.
            allowed_names: Names to never anonymize.
            entity_types: PII entity types to detect. Default: all standard types.
            preamble: Text to prepend to sanitized prompt.
            regex_patterns: Custom regex patterns for detection.
            use_faker: Use fake data instead of [REDACTED_*] placeholders.
            recognizer_conf: NER model configuration. Default: DEBERTA_AI4PRIVACY_v2_CONF.
            threshold: Minimum confidence score. Default: 0.5.
            use_onnx: Use ONNX runtime for inference. Default: False.
            language: Detection language ("en" or "zh"). Default: "en".
        """

    def scan(self, prompt: str) -> tuple[str, bool, float]:
        """
        Scan prompt for PII and replace with placeholders.

        Returns:
            - Sanitized prompt with PII replaced
            - False if PII was found, True if clean
            - Risk score based on highest detection confidence
        """

Import

from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault

I/O Contract

Inputs

Name	Type	Required	Description
vault	Vault	Yes	Shared vault for storing placeholder mappings
hidden_names	list[str]	No	Custom names to always anonymize
allowed_names	list[str]	No	Names exempt from anonymization
entity_types	list[str]	No	PII types to detect (default: CREDIT_CARD, EMAIL_ADDRESS, PERSON, PHONE_NUMBER, US_SSN, etc.)
preamble	str	No	Text to prepend to sanitized prompt (default: "")
use_faker	bool	No	Use fake data instead of placeholders (default: False)
recognizer_conf	NERConfig	No	NER model config (default: DEBERTA_AI4PRIVACY_v2_CONF)
threshold	float	No	Minimum confidence score (default: 0.5)
use_onnx	bool	No	Use ONNX runtime (default: False)
language	str	No	Detection language: "en" or "zh" (default: "en")

Outputs

Name	Type	Description
sanitized_prompt	str	Prompt with PII replaced by [REDACTED_TYPE_N] placeholders
is_valid	bool	False if PII was found, True if prompt is clean
risk_score	float	Highest NER confidence score, normalized against threshold

Usage Examples

Basic PII Anonymization

from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault

vault = Vault()
scanner = Anonymize(vault)

prompt = "My name is John Smith and my email is john@example.com"
sanitized, is_valid, score = scanner.scan(prompt)
# sanitized: "My name is [REDACTED_PERSON_1] and my email is [REDACTED_EMAIL_ADDRESS_1]"
# is_valid: False (PII was detected)
# vault.get(): [("[REDACTED_PERSON_1]", "John Smith"), ("[REDACTED_EMAIL_ADDRESS_1]", "john@example.com")]

With Faker Replacement

from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault

vault = Vault()
scanner = Anonymize(vault, use_faker=True)

prompt = "Contact Jane Doe at jane.doe@company.com"
sanitized, is_valid, score = scanner.scan(prompt)
# sanitized: "Contact Emily Johnson at michael.brown@example.org"
# (faker generates realistic but fake replacements)

Custom Entity Types

from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault

vault = Vault()
scanner = Anonymize(
    vault,
    entity_types=["PERSON", "EMAIL_ADDRESS"],  # Only detect these types
    threshold=0.7,
    use_onnx=True,
)

Related Pages

Implements Principle

Principle:Protectai_Llm_guard_PII_Anonymization

Requires Environment

Uses Heuristic

Heuristic:Protectai_Llm_guard_ONNX_Runtime_Optimization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment