Implementation:Protectai Llm guard Input Regex

Knowledge Sources	Protectai_Llm_guard
Domains	Pattern_Matching, Content_Filtering
Last Updated	2026-02-14 12:00 GMT

Overview

The Regex scanner performs pattern matching on prompts using user-defined regular expressions with optional redaction.

Description

Regex is an input scanner that checks prompts against a list of user-defined regular expression patterns. It supports three matching strategies via the MatchType enum: SEARCH (finds a match anywhere in the text using re.search), FULL_MATCH (requires the entire text to match using re.fullmatch), and ALL (all patterns must match). The scanner can be configured to either block prompts matching the patterns (is_blocked=True) or require prompts to match the patterns (is_blocked=False). When redact is enabled (default True), matched patterns are replaced with the string "[REDACTED]" in the output. This scanner does not use any ML model, making it fast and deterministic.

Usage

Use the Regex scanner when you need deterministic, rule-based pattern matching on prompts. This is ideal for detecting or blocking specific patterns like email addresses, phone numbers, URLs, SQL injection patterns, or custom data formats. It complements ML-based scanners by providing exact, predictable matching behavior.

Code Reference

Source Location

Repository: Protectai_Llm_guard
File: llm_guard/input_scanners/regex.py
Lines: 1-103

Signature

class Regex(Scanner):
    def __init__(
        self,
        patterns: list[str],
        *,
        is_blocked: bool = True,
        match_type: MatchType | str = MatchType.ALL,
        redact: bool = True,
    ) -> None: ...

    def scan(self, prompt: str) -> tuple[str, bool, float]: ...

MatchType Enum

class MatchType(str, Enum):
    SEARCH = "search"
    FULL_MATCH = "full_match"
    ALL = "all"

Import

from llm_guard.input_scanners import Regex

I/O Contract

Inputs

Name	Type	Required	Description
patterns	list[str]	Yes	List of regular expression patterns to match against prompts.
is_blocked	bool	No	If True, matching patterns cause the prompt to be flagged as invalid; if False, prompts must match to be valid. Defaults to True.
match_type	MatchType or str	No	Matching strategy: "search" (match anywhere), "full_match" (entire text must match), "all" (all patterns must match). Defaults to MatchType.ALL.
redact	bool	No	Whether to replace matched patterns with "[REDACTED]". Defaults to True.

scan() Inputs

Name	Type	Required	Description
prompt	str	Yes	The input text to match against the defined regex patterns.

Outputs

Name	Type	Description
prompt	str	The prompt with matched patterns redacted (if redact=True), or the original prompt.
is_valid	bool	True if the prompt passes the regex check (based on is_blocked setting); False otherwise.
risk_score	float	1.0 if patterns matched (and is_blocked=True); 0.0 otherwise.

Usage Examples

Block Email Addresses

from llm_guard.input_scanners import Regex

scanner = Regex(
    patterns=[r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"],
    is_blocked=True,
    redact=True,
)
prompt = "Send the report to john.doe@example.com please"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(sanitized_prompt)  # "Send the report to [REDACTED] please"
print(is_valid)          # False (email pattern matched)
print(risk_score)        # 1.0

Require Specific Format

from llm_guard.input_scanners import Regex

# Require prompts to contain a ticket number
scanner = Regex(
    patterns=[r"TICKET-\d{4,6}"],
    is_blocked=False,
    match_type="search",
)
prompt = "Please look into TICKET-12345"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # True (required pattern found)

Block SQL Injection Patterns

from llm_guard.input_scanners import Regex

scanner = Regex(
    patterns=[
        r"(?i)\b(SELECT|INSERT|UPDATE|DELETE|DROP|UNION)\b.*\b(FROM|INTO|SET|TABLE|ALL)\b",
        r"(?i);\s*--",
        r"(?i)'\s*(OR|AND)\s+'",
    ],
    is_blocked=True,
    match_type="search",
    redact=False,
)
prompt = "'; DROP TABLE users; --"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # False (SQL injection pattern detected)

Related Pages

Principle:Protectai_Llm_guard_Regex_Pattern_Matching

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment