Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Protectai Llm guard PII Anonymization Deanonymization

From Leeroopedia
Knowledge Sources
Domains LLM_Security, PII_Protection, Data_Privacy
Last Updated 2026-02-14 12:00 GMT

Overview

End-to-end process for protecting personally identifiable information (PII) in LLM interactions by anonymizing sensitive entities before the LLM call and restoring them in the output afterward.

Description

This workflow implements the Anonymize-Deanonymize pattern, a cross-cutting concern that spans the entire LLM call boundary. The Anonymize input scanner detects PII entities (names, emails, phone numbers, credit cards, IP addresses, and more) using NER models, regex patterns, and the Presidio framework. Detected entities are replaced with consistent placeholders or fake data. A shared Vault object stores the mapping between placeholders and original values. After the LLM produces a response using the anonymized prompt, the Deanonymize output scanner restores original values by reversing the placeholder mappings from the Vault.

Usage

Execute this workflow when your LLM application processes user data containing personally identifiable information that must not be exposed to the LLM provider. This is critical for compliance with privacy regulations (GDPR, HIPAA, CCPA) and for preventing PII leakage through model responses.

Execution Steps

Step 1: Initialize the Vault

Create a Vault instance that will store the bidirectional mapping between original PII entities and their placeholder replacements. This single Vault instance must be shared between the Anonymize input scanner and the Deanonymize output scanner.

Key considerations:

  • The Vault is an in-memory store that persists for the lifetime of a single request
  • One Vault instance must be shared across the Anonymize and Deanonymize scanner pair
  • The Vault maps placeholder strings back to their original values for later restoration

Step 2: Configure the Anonymize scanner

Instantiate the Anonymize input scanner with the shared Vault and configure its detection capabilities. The scanner supports multiple detection methods: transformer-based NER models, regex pattern matching, and Presidio-based entity recognition. Choose between placeholder replacement (e.g., replacing "John Doe" with "[PERSON_1]") and faker-based replacement (e.g., replacing with a realistic fake name).

Key considerations:

  • Set use_faker to True for realistic fake data replacement, False for bracket-style placeholders
  • Configure the NER model: the default model handles common entities; specialized models like the AI4Privacy DeBERTa model provide broader coverage
  • Set recognition thresholds to balance recall (catching all PII) against precision (avoiding false positives)
  • Supported entity types include PERSON, EMAIL, PHONE_NUMBER, CREDIT_CARD, IP_ADDRESS, LOCATION, and many more
  • Language-specific recognizers are available for Chinese text (phone, email, IP, crypto addresses)

Step 3: Configure the Deanonymize scanner

Instantiate the Deanonymize output scanner with the same Vault instance used by the Anonymize scanner. Configure the matching strategy that determines how placeholders in the LLM output are mapped back to original values.

Key considerations:

  • The matching_strategy parameter controls how aggressively the scanner searches for placeholders
  • exact matching looks for exact placeholder strings in the output
  • case_insensitive matching handles cases where the LLM changed the case of placeholders
  • fuzzy matching handles cases where the LLM slightly modified placeholder text
  • combined matching tries all strategies in sequence

Step 4: Scan and anonymize the prompt

Run the user prompt through the Anonymize scanner. The scanner detects PII entities, stores the original-to-placeholder mappings in the Vault, and returns the anonymized prompt. The anonymized prompt is safe to send to the LLM because all sensitive data has been replaced.

What happens:

  • NER models identify named entities (persons, organizations, locations)
  • Regex patterns detect structured data (emails, phone numbers, credit cards, IP addresses)
  • Presidio analyzers provide additional entity recognition with configurable confidence scores
  • Each detected entity is mapped to a unique, consistent placeholder
  • Identical entity values receive the same placeholder for consistency

Step 5: Send anonymized prompt to the LLM

Pass the anonymized prompt to the LLM API. The model processes the prompt without ever seeing the original PII, generating a response that may contain the placeholder tokens.

Key considerations:

  • The LLM sees only placeholders or fake data, never the real PII
  • The LLM's response quality may differ when working with placeholders versus real data
  • Faker-based replacement generally produces more natural model responses than bracket-style placeholders

Step 6: Deanonymize the output

Run the LLM's response through the Deanonymize output scanner. The scanner locates placeholder tokens in the response text and replaces them with the original PII values from the Vault, producing a response that contains the correct original information.

Key considerations:

  • The Deanonymize scanner requires the prompt as context to properly match placeholders
  • If the LLM hallucinated new placeholder-like strings, they will not be deanonymized (no Vault entry)
  • The restored output should be treated as sensitive and handled according to your data retention policies

Execution Diagram

GitHub URL

Workflow Repository