Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Protectai Llm guard Regex Pattern Matching

From Leeroopedia
Knowledge Sources
Domains Pattern_Matching, Content_Filtering
Last Updated 2026-02-14 12:00 GMT

Overview

Detecting or validating text content using user-defined regular expression patterns with configurable matching strategies.

Description

Regex Pattern Matching is a content filtering principle that provides user-defined, pattern-based text scanning using regular expressions. Unlike fixed-purpose scanners, this principle offers a general-purpose mechanism for enforcing arbitrary text policies expressed as regex patterns. It serves as a flexible building block for custom validation rules that do not fit neatly into other specialized scanners.

The principle supports three matching modes that control how patterns are applied to text. Search mode finds the first occurrence of the pattern anywhere in the text, useful for detecting forbidden content. Fullmatch mode requires the entire text to match the pattern, useful for validating that input conforms to an expected format. All mode finds every occurrence of the pattern throughout the text, useful for comprehensive scanning and counting.

Two policy modes determine how matches are interpreted. In blocklist mode, a pattern match indicates that the text contains forbidden content and should be flagged. In allowlist mode, a pattern match indicates that the text contains required content, and the absence of a match triggers flagging. This duality allows regex patterns to serve both restrictive and permissive policy functions.

When violations are detected, the system supports optional redaction of matched content, replacing matched substrings with configurable placeholder text rather than rejecting the entire input.

Usage

Use this principle when you need to enforce custom text policies that can be expressed as regular expressions but are not covered by other specialized scanners. Common applications include detecting and redacting phone numbers, email addresses, or custom identifiers; validating that input follows a required format; blocking specific URL patterns or domains; enforcing naming conventions; and catching domain-specific patterns unique to your organization. This principle is ideal for rapid policy deployment since adding a new rule requires only defining a regex pattern, with no model training or data collection.

Theoretical Basis

The pattern matching algorithm operates as follows:

Pattern Compilation:

  • Compile each user-defined regex pattern into an optimized automaton
  • Patterns are compiled once at initialization and reused across scans

Matching Modes:

  • Search: Apply re.search(pattern, text) to find the first match anywhere in the text
  • Fullmatch: Apply re.fullmatch(pattern, text) to check if the entire text matches the pattern
  • All: Apply re.findall(pattern, text) or re.finditer(pattern, text) to locate every occurrence

Policy Evaluation:

  • In blocklist mode: if any pattern produces a match, the text is flagged as containing forbidden content
  • In allowlist mode: if no pattern produces a match, the text is flagged as missing required content
  • Multiple patterns can be specified and are evaluated independently

Redaction (optional):

  • For each match, replace the matched substring with a configurable placeholder
  • Use re.sub(pattern, replacement, text) to produce the redacted output
  • Return the redacted text alongside the validation result

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment