Overview
The Regex scanner performs pattern matching on prompts using user-defined regular expressions with optional redaction.
Description
Regex is an input scanner that checks prompts against a list of user-defined regular expression patterns. It supports three matching strategies via the MatchType enum: SEARCH (finds a match anywhere in the text using re.search), FULL_MATCH (requires the entire text to match using re.fullmatch), and ALL (all patterns must match). The scanner can be configured to either block prompts matching the patterns (is_blocked=True) or require prompts to match the patterns (is_blocked=False). When redact is enabled (default True), matched patterns are replaced with the string "[REDACTED]" in the output. This scanner does not use any ML model, making it fast and deterministic.
Usage
Use the Regex scanner when you need deterministic, rule-based pattern matching on prompts. This is ideal for detecting or blocking specific patterns like email addresses, phone numbers, URLs, SQL injection patterns, or custom data formats. It complements ML-based scanners by providing exact, predictable matching behavior.
Code Reference
Source Location
Signature
class Regex(Scanner):
def __init__(
self,
patterns: list[str],
*,
is_blocked: bool = True,
match_type: MatchType | str = MatchType.ALL,
redact: bool = True,
) -> None: ...
def scan(self, prompt: str) -> tuple[str, bool, float]: ...
MatchType Enum
class MatchType(str, Enum):
SEARCH = "search"
FULL_MATCH = "full_match"
ALL = "all"
Import
from llm_guard.input_scanners import Regex
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| patterns |
list[str] |
Yes |
List of regular expression patterns to match against prompts.
|
| is_blocked |
bool |
No |
If True, matching patterns cause the prompt to be flagged as invalid; if False, prompts must match to be valid. Defaults to True.
|
| match_type |
MatchType or str |
No |
Matching strategy: "search" (match anywhere), "full_match" (entire text must match), "all" (all patterns must match). Defaults to MatchType.ALL.
|
| redact |
bool |
No |
Whether to replace matched patterns with "[REDACTED]". Defaults to True.
|
scan() Inputs
| Name |
Type |
Required |
Description
|
| prompt |
str |
Yes |
The input text to match against the defined regex patterns.
|
Outputs
| Name |
Type |
Description
|
| prompt |
str |
The prompt with matched patterns redacted (if redact=True), or the original prompt.
|
| is_valid |
bool |
True if the prompt passes the regex check (based on is_blocked setting); False otherwise.
|
| risk_score |
float |
1.0 if patterns matched (and is_blocked=True); 0.0 otherwise.
|
Usage Examples
Block Email Addresses
from llm_guard.input_scanners import Regex
scanner = Regex(
patterns=[r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"],
is_blocked=True,
redact=True,
)
prompt = "Send the report to john.doe@example.com please"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)
print(sanitized_prompt) # "Send the report to [REDACTED] please"
print(is_valid) # False (email pattern matched)
print(risk_score) # 1.0
Require Specific Format
from llm_guard.input_scanners import Regex
# Require prompts to contain a ticket number
scanner = Regex(
patterns=[r"TICKET-\d{4,6}"],
is_blocked=False,
match_type="search",
)
prompt = "Please look into TICKET-12345"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)
print(is_valid) # True (required pattern found)
Block SQL Injection Patterns
from llm_guard.input_scanners import Regex
scanner = Regex(
patterns=[
r"(?i)\b(SELECT|INSERT|UPDATE|DELETE|DROP|UNION)\b.*\b(FROM|INTO|SET|TABLE|ALL)\b",
r"(?i);\s*--",
r"(?i)'\s*(OR|AND)\s+'",
],
is_blocked=True,
match_type="search",
redact=False,
)
prompt = "'; DROP TABLE users; --"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)
print(is_valid) # False (SQL injection pattern detected)
Related Pages