Implementation:Protectai Llm guard Anonymize
| Knowledge Sources | |
|---|---|
| Domains | NLP, Data_Privacy, Named_Entity_Recognition |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Concrete tool for detecting and anonymizing PII in text using NER models, regex patterns, and Presidio integration, provided by the LLM Guard library.
Description
The Anonymize class is an input scanner that detects personally identifiable information using a combination of transformer-based NER (default: DeBERTa Ai4Privacy v2), regex patterns, and Presidio's AnalyzerEngine. Detected entities are replaced with indexed placeholders like [REDACTED_PERSON_1] and stored in a shared Vault instance for later deanonymization.
The scanner supports:
- 12+ default entity types (CREDIT_CARD, EMAIL_ADDRESS, PERSON, PHONE_NUMBER, US_SSN, etc.)
- Custom hidden names for forced anonymization
- Allowed names for exemption from anonymization
- Faker-based replacement for realistic pseudonymization
- English and Chinese language support
- ONNX runtime for optimized inference
Usage
Import this scanner when user prompts may contain PII that must be removed before sending to an LLM. Always pair with a Deanonymize output scanner using the same Vault instance for reversible anonymization.
Code Reference
Source Location
- Repository: llm-guard
- File: llm_guard/input_scanners/anonymize.py
- Lines: L46-396
Signature
class Anonymize(Scanner):
def __init__(
self,
vault: Vault,
*,
hidden_names: list[str] | None = None,
allowed_names: list[str] | None = None,
entity_types: list[str] | None = None,
preamble: str = "",
regex_patterns: list[DefaultRegexPatterns | RegexPatternsReuse] | None = None,
use_faker: bool = False,
recognizer_conf: NERConfig | None = None,
threshold: float = 0.5,
use_onnx: bool = False,
language: str = "en",
) -> None:
"""
Args:
vault: Vault instance to store anonymized mappings.
hidden_names: Names to always anonymize.
allowed_names: Names to never anonymize.
entity_types: PII entity types to detect. Default: all standard types.
preamble: Text to prepend to sanitized prompt.
regex_patterns: Custom regex patterns for detection.
use_faker: Use fake data instead of [REDACTED_*] placeholders.
recognizer_conf: NER model configuration. Default: DEBERTA_AI4PRIVACY_v2_CONF.
threshold: Minimum confidence score. Default: 0.5.
use_onnx: Use ONNX runtime for inference. Default: False.
language: Detection language ("en" or "zh"). Default: "en".
"""
def scan(self, prompt: str) -> tuple[str, bool, float]:
"""
Scan prompt for PII and replace with placeholders.
Returns:
- Sanitized prompt with PII replaced
- False if PII was found, True if clean
- Risk score based on highest detection confidence
"""
Import
from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| vault | Vault | Yes | Shared vault for storing placeholder mappings |
| hidden_names | list[str] | No | Custom names to always anonymize |
| allowed_names | list[str] | No | Names exempt from anonymization |
| entity_types | list[str] | No | PII types to detect (default: CREDIT_CARD, EMAIL_ADDRESS, PERSON, PHONE_NUMBER, US_SSN, etc.) |
| preamble | str | No | Text to prepend to sanitized prompt (default: "") |
| use_faker | bool | No | Use fake data instead of placeholders (default: False) |
| recognizer_conf | NERConfig | No | NER model config (default: DEBERTA_AI4PRIVACY_v2_CONF) |
| threshold | float | No | Minimum confidence score (default: 0.5) |
| use_onnx | bool | No | Use ONNX runtime (default: False) |
| language | str | No | Detection language: "en" or "zh" (default: "en") |
Outputs
| Name | Type | Description |
|---|---|---|
| sanitized_prompt | str | Prompt with PII replaced by [REDACTED_TYPE_N] placeholders |
| is_valid | bool | False if PII was found, True if prompt is clean |
| risk_score | float | Highest NER confidence score, normalized against threshold |
Usage Examples
Basic PII Anonymization
from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault
vault = Vault()
scanner = Anonymize(vault)
prompt = "My name is John Smith and my email is john@example.com"
sanitized, is_valid, score = scanner.scan(prompt)
# sanitized: "My name is [REDACTED_PERSON_1] and my email is [REDACTED_EMAIL_ADDRESS_1]"
# is_valid: False (PII was detected)
# vault.get(): [("[REDACTED_PERSON_1]", "John Smith"), ("[REDACTED_EMAIL_ADDRESS_1]", "john@example.com")]
With Faker Replacement
from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault
vault = Vault()
scanner = Anonymize(vault, use_faker=True)
prompt = "Contact Jane Doe at jane.doe@company.com"
sanitized, is_valid, score = scanner.scan(prompt)
# sanitized: "Contact Emily Johnson at michael.brown@example.org"
# (faker generates realistic but fake replacements)
Custom Entity Types
from llm_guard.input_scanners import Anonymize
from llm_guard.vault import Vault
vault = Vault()
scanner = Anonymize(
vault,
entity_types=["PERSON", "EMAIL_ADDRESS"], # Only detect these types
threshold=0.7,
use_onnx=True,
)
Related Pages
Implements Principle
Requires Environment
- Environment:Protectai_Llm_guard_Python_Runtime_Dependencies
- Environment:Protectai_Llm_guard_ONNX_Runtime_Acceleration