Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Protectai Llm guard Anonymize

From Leeroopedia
Knowledge Sources
Domains NLP, Data_Privacy, Named_Entity_Recognition
Last Updated 2026-02-14 12:00 GMT

Overview

Concrete tool for detecting and anonymizing PII in text using NER models, regex patterns, and Presidio integration, provided by the LLM Guard library.

Description

The Anonymize class is an input scanner that detects personally identifiable information using a combination of transformer-based NER (default: DeBERTa Ai4Privacy v2), regex patterns, and Presidio's AnalyzerEngine. Detected entities are replaced with indexed placeholders like [REDACTED_PERSON_1] and stored in a shared Vault instance for later deanonymization.

The scanner supports:

  • 12+ default entity types (CREDIT_CARD, EMAIL_ADDRESS, PERSON, PHONE_NUMBER, US_SSN, etc.)
  • Custom hidden names for forced anonymization
  • Allowed names for exemption from anonymization
  • Faker-based replacement for realistic pseudonymization
  • English and Chinese language support
  • ONNX runtime for optimized inference

Usage

Import this scanner when user prompts may contain PII that must be removed before sending to an LLM. Always pair with a Deanonymize output scanner using the same Vault instance for reversible anonymization.

Code Reference

Source Location

  • Repository: llm-guard
  • File: llm_guard/input_scanners/anonymize.py
  • Lines: L46-396

Signature

class Anonymize(Scanner):
    def __init__(
        self,
        vault: Vault,
        *,
        hidden_names: list[str] | None = None,
        allowed_names: list[str] | None = None,
        entity_types: list[str] | None = None,
        preamble: str = "",
        regex_patterns: list[DefaultRegexPatterns | RegexPatternsReuse] | None = None,
        use_faker: bool = False,
        recognizer_conf: NERConfig | None = None,
        threshold: float = 0.5,
        use_onnx: bool = False,
        language: str = "en",
    ) -> None:
        """
        Args:
            vault: Vault instance to store anonymized mappings.
            hidden_names: Names to always anonymize.
            allowed_names: Names to never anonymize.
            entity_types: PII entity types to detect. Default: all standard types.
            preamble: Text to prepend to sanitized prompt.
            regex_patterns: Custom regex patterns for detection.
            use_faker: Use fake data instead of [REDACTED_*] placeholders.
            recognizer_conf: NER model configuration. Default: DEBERTA_AI4PRIVACY_v2_CONF.
            threshold: Minimum confidence score. Default: 0.5.
            use_onnx: Use ONNX runtime for inference. Default: False.
            language: Detection language ("en" or "zh"). Default: "en".
        """

    def scan(self, prompt: str) -> tuple[str, bool, float]:
        """
        Scan prompt for PII and replace with placeholders.

        Returns:
            - Sanitized prompt with PII replaced
            - False if PII was found, True if clean
            - Risk score based on highest detection confidence
        """

Import

from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault

I/O Contract

Inputs

Name Type Required Description
vault Vault Yes Shared vault for storing placeholder mappings
hidden_names list[str] No Custom names to always anonymize
allowed_names list[str] No Names exempt from anonymization
entity_types list[str] No PII types to detect (default: CREDIT_CARD, EMAIL_ADDRESS, PERSON, PHONE_NUMBER, US_SSN, etc.)
preamble str No Text to prepend to sanitized prompt (default: "")
use_faker bool No Use fake data instead of placeholders (default: False)
recognizer_conf NERConfig No NER model config (default: DEBERTA_AI4PRIVACY_v2_CONF)
threshold float No Minimum confidence score (default: 0.5)
use_onnx bool No Use ONNX runtime (default: False)
language str No Detection language: "en" or "zh" (default: "en")

Outputs

Name Type Description
sanitized_prompt str Prompt with PII replaced by [REDACTED_TYPE_N] placeholders
is_valid bool False if PII was found, True if prompt is clean
risk_score float Highest NER confidence score, normalized against threshold

Usage Examples

Basic PII Anonymization

from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault

vault = Vault()
scanner = Anonymize(vault)

prompt = "My name is John Smith and my email is john@example.com"
sanitized, is_valid, score = scanner.scan(prompt)
# sanitized: "My name is [REDACTED_PERSON_1] and my email is [REDACTED_EMAIL_ADDRESS_1]"
# is_valid: False (PII was detected)
# vault.get(): [("[REDACTED_PERSON_1]", "John Smith"), ("[REDACTED_EMAIL_ADDRESS_1]", "john@example.com")]

With Faker Replacement

from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault

vault = Vault()
scanner = Anonymize(vault, use_faker=True)

prompt = "Contact Jane Doe at jane.doe@company.com"
sanitized, is_valid, score = scanner.scan(prompt)
# sanitized: "Contact Emily Johnson at michael.brown@example.org"
# (faker generates realistic but fake replacements)

Custom Entity Types

from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault

vault = Vault()
scanner = Anonymize(
    vault,
    entity_types=["PERSON", "EMAIL_ADDRESS"],  # Only detect these types
    threshold=0.7,
    use_onnx=True,
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment