Workflow:Protectai Llm guard PII Anonymization Deanonymization

Knowledge Sources	LLM Guard LLM Guard Docs
Domains	LLM_Security, PII_Protection, Data_Privacy
Last Updated	2026-02-14 12:00 GMT

Overview

End-to-end process for protecting personally identifiable information (PII) in LLM interactions by anonymizing sensitive entities before the LLM call and restoring them in the output afterward.

Description

This workflow implements the Anonymize-Deanonymize pattern, a cross-cutting concern that spans the entire LLM call boundary. The Anonymize input scanner detects PII entities (names, emails, phone numbers, credit cards, IP addresses, and more) using NER models, regex patterns, and the Presidio framework. Detected entities are replaced with consistent placeholders or fake data. A shared Vault object stores the mapping between placeholders and original values. After the LLM produces a response using the anonymized prompt, the Deanonymize output scanner restores original values by reversing the placeholder mappings from the Vault.

Usage

Execute this workflow when your LLM application processes user data containing personally identifiable information that must not be exposed to the LLM provider. This is critical for compliance with privacy regulations (GDPR, HIPAA, CCPA) and for preventing PII leakage through model responses.

Execution Steps

Step 1: Initialize the Vault

Create a Vault instance that will store the bidirectional mapping between original PII entities and their placeholder replacements. This single Vault instance must be shared between the Anonymize input scanner and the Deanonymize output scanner.

Key considerations:

The Vault is an in-memory store that persists for the lifetime of a single request
One Vault instance must be shared across the Anonymize and Deanonymize scanner pair
The Vault maps placeholder strings back to their original values for later restoration

Step 2: Configure the Anonymize scanner

Instantiate the Anonymize input scanner with the shared Vault and configure its detection capabilities. The scanner supports multiple detection methods: transformer-based NER models, regex pattern matching, and Presidio-based entity recognition. Choose between placeholder replacement (e.g., replacing "John Doe" with "[PERSON_1]") and faker-based replacement (e.g., replacing with a realistic fake name).

Key considerations:

Set use_faker to True for realistic fake data replacement, False for bracket-style placeholders
Configure the NER model: the default model handles common entities; specialized models like the AI4Privacy DeBERTa model provide broader coverage
Set recognition thresholds to balance recall (catching all PII) against precision (avoiding false positives)
Supported entity types include PERSON, EMAIL, PHONE_NUMBER, CREDIT_CARD, IP_ADDRESS, LOCATION, and many more
Language-specific recognizers are available for Chinese text (phone, email, IP, crypto addresses)

Step 3: Configure the Deanonymize scanner

Instantiate the Deanonymize output scanner with the same Vault instance used by the Anonymize scanner. Configure the matching strategy that determines how placeholders in the LLM output are mapped back to original values.

Key considerations:

The matching_strategy parameter controls how aggressively the scanner searches for placeholders
exact matching looks for exact placeholder strings in the output
case_insensitive matching handles cases where the LLM changed the case of placeholders
fuzzy matching handles cases where the LLM slightly modified placeholder text
combined matching tries all strategies in sequence

Step 4: Scan and anonymize the prompt

Run the user prompt through the Anonymize scanner. The scanner detects PII entities, stores the original-to-placeholder mappings in the Vault, and returns the anonymized prompt. The anonymized prompt is safe to send to the LLM because all sensitive data has been replaced.

What happens:

NER models identify named entities (persons, organizations, locations)
Regex patterns detect structured data (emails, phone numbers, credit cards, IP addresses)
Presidio analyzers provide additional entity recognition with configurable confidence scores
Each detected entity is mapped to a unique, consistent placeholder
Identical entity values receive the same placeholder for consistency

Step 5: Send anonymized prompt to the LLM

Pass the anonymized prompt to the LLM API. The model processes the prompt without ever seeing the original PII, generating a response that may contain the placeholder tokens.

Key considerations:

The LLM sees only placeholders or fake data, never the real PII
The LLM's response quality may differ when working with placeholders versus real data
Faker-based replacement generally produces more natural model responses than bracket-style placeholders

Step 6: Deanonymize the output

Run the LLM's response through the Deanonymize output scanner. The scanner locates placeholder tokens in the response text and replaces them with the original PII values from the Vault, producing a response that contains the correct original information.

Key considerations:

The Deanonymize scanner requires the prompt as context to properly match placeholders
If the LLM hallucinated new placeholder-like strings, they will not be deanonymized (no Vault entry)
The restored output should be treated as sensitive and handled according to your data retention policies

Execution Diagram

GitHub URL

Workflow Repository