Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Protectai Llm guard Invisible Text Detection

From Leeroopedia
Knowledge Sources
Domains Security, Unicode_Detection
Last Updated 2026-02-14 12:00 GMT

Overview

Detecting and removing invisible Unicode characters that can be used for prompt injection or data exfiltration.

Description

Invisible Text Detection is a security principle that identifies invisible or non-rendering Unicode characters embedded within text. The Unicode standard includes numerous character categories that produce no visible output when rendered, yet are fully preserved during text processing. Attackers can exploit these characters to embed hidden instructions within seemingly innocuous text, enabling sophisticated prompt injection attacks or data exfiltration schemes.

The principle targets three specific Unicode general categories. Format characters (Cf) include zero-width spaces, zero-width joiners, bidirectional control characters, and other formatting marks that influence text rendering but produce no visible glyphs. Private Use characters (Co) are code points reserved for application-specific use that have no standard visual representation. Unassigned characters (Cn) are code points that have not been assigned any meaning in the Unicode standard and should not appear in legitimate text.

These invisible characters can carry hidden information through their presence or absence at specific positions in the text, effectively encoding a covert channel within text that appears empty or normal to human readers. An attacker could, for example, encode an entire set of malicious instructions using sequences of zero-width characters interleaved with legitimate text.

Usage

Use this principle as an input sanitization layer for any system where text is received from untrusted sources before being passed to a language model. It is critical in scenarios involving copy-pasted text (which may carry invisible characters from the source), web-scraped content (which may contain formatting artifacts), and adversarial inputs (where invisible characters are deliberately injected). This principle should be applied early in the processing pipeline, before other scanners, to ensure that downstream analysis operates on clean, visible text.

Theoretical Basis

The detection algorithm operates through Unicode category analysis:

Character Categorization:

  • Iterate over each character in the input text
  • Determine the Unicode general category of each character
  • Flag characters belonging to targeted categories:
    • Cf (Format): Zero-width space (U+200B), zero-width joiner (U+200D), zero-width non-joiner (U+200C), bidirectional marks, etc.
    • Co (Private Use): Characters in U+E000..U+F8FF and supplementary private use areas
    • Cn (Unassigned): Code points not assigned in the current Unicode standard

Detection:

  • Count the number of invisible characters found
  • If any invisible characters are detected, flag the text

Remediation:

  • Remove all detected invisible characters from the text
  • Return the sanitized text with only visible, standard characters preserved
  • Report the count and types of invisible characters that were removed

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment