Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Protectai Llm guard Input InvisibleText

From Leeroopedia
Knowledge Sources
Domains Security, Unicode_Detection
Last Updated 2026-02-14 12:00 GMT

Overview

The InvisibleText scanner detects and removes invisible Unicode characters from prompts, preventing hidden text injection attacks.

Description

InvisibleText is a lightweight input scanner that identifies invisible Unicode characters belonging to the Cf (Format), Co (Private Use), and Cn (Unassigned) Unicode categories. These invisible characters can be used by attackers to embed hidden instructions or payloads within seemingly innocuous prompts -- a technique known as invisible text injection or Unicode smuggling. The scanner uses Python's built-in unicodedata module and does not require any ML model, making it extremely fast and lightweight. When invisible characters are detected, they are removed from the prompt and the prompt is flagged as invalid. The static method contains_unicode can be used independently to check for invisible characters without scanning.

Usage

Use the InvisibleText scanner as a first-line defense against Unicode-based prompt injection attacks. This scanner should be included in most scanning pipelines due to its minimal overhead and its ability to catch a class of attacks that other scanners might miss. It is particularly important in security-sensitive applications where adversarial inputs are a concern.

Code Reference

Source Location

Signature

class InvisibleText(Scanner):
    def __init__(self) -> None: ...

    def scan(self, prompt: str) -> tuple[str, bool, float]: ...

    @staticmethod
    def contains_unicode(text: str) -> bool: ...

Import

from llm_guard.input_scanners import InvisibleText

I/O Contract

Inputs

Name Type Required Description
No constructor parameters required.

scan() Inputs

Name Type Required Description
prompt str Yes The input text to scan for invisible Unicode characters.

Outputs

Name Type Description
prompt str The cleaned prompt with all invisible Unicode characters removed.
is_valid bool True if no invisible characters were found; False if invisible characters were detected and removed.
risk_score float 1.0 if invisible characters were found; 0.0 otherwise.

contains_unicode()

Name Type Description
text str The text to check for invisible Unicode characters.
return bool True if invisible Unicode characters are present; False otherwise.

Unicode Categories Detected

Category Code Category Name Description
Cf Format Invisible formatting characters such as zero-width spaces, zero-width joiners, and directional markers.
Co Private Use Characters in the Unicode Private Use Areas that have no standard visible representation.
Cn Unassigned Unicode code points that have not been assigned to any character.

Usage Examples

Basic Usage

from llm_guard.input_scanners import InvisibleText

scanner = InvisibleText()
# Prompt containing a zero-width space (U+200B)
prompt = "Hello\u200B World"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(sanitized_prompt)  # "Hello World" (invisible character removed)
print(is_valid)          # False (invisible characters were found)
print(risk_score)        # 1.0

Static Check Without Scanning

from llm_guard.input_scanners import InvisibleText

# Quick check without full scanning
text = "Normal text without hidden characters"
has_invisible = InvisibleText.contains_unicode(text)
print(has_invisible)  # False

text_with_hidden = "Hidden\u200Btext\u200Bhere"
has_invisible = InvisibleText.contains_unicode(text_with_hidden)
print(has_invisible)  # True

Pipeline Integration

from llm_guard.input_scanners import InvisibleText

# Use as a lightweight first-pass scanner
scanner = InvisibleText()
prompt = "Summarize this document for me"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

if is_valid:
    print("Prompt is clean, proceeding to LLM")
else:
    print("Invisible characters detected and removed")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment