Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Protectai Llm guard Input Regex

From Leeroopedia
Knowledge Sources
Domains Pattern_Matching, Content_Filtering
Last Updated 2026-02-14 12:00 GMT

Overview

The Regex scanner performs pattern matching on prompts using user-defined regular expressions with optional redaction.

Description

Regex is an input scanner that checks prompts against a list of user-defined regular expression patterns. It supports three matching strategies via the MatchType enum: SEARCH (finds a match anywhere in the text using re.search), FULL_MATCH (requires the entire text to match using re.fullmatch), and ALL (all patterns must match). The scanner can be configured to either block prompts matching the patterns (is_blocked=True) or require prompts to match the patterns (is_blocked=False). When redact is enabled (default True), matched patterns are replaced with the string "[REDACTED]" in the output. This scanner does not use any ML model, making it fast and deterministic.

Usage

Use the Regex scanner when you need deterministic, rule-based pattern matching on prompts. This is ideal for detecting or blocking specific patterns like email addresses, phone numbers, URLs, SQL injection patterns, or custom data formats. It complements ML-based scanners by providing exact, predictable matching behavior.

Code Reference

Source Location

Signature

class Regex(Scanner):
    def __init__(
        self,
        patterns: list[str],
        *,
        is_blocked: bool = True,
        match_type: MatchType | str = MatchType.ALL,
        redact: bool = True,
    ) -> None: ...

    def scan(self, prompt: str) -> tuple[str, bool, float]: ...

MatchType Enum

class MatchType(str, Enum):
    SEARCH = "search"
    FULL_MATCH = "full_match"
    ALL = "all"

Import

from llm_guard.input_scanners import Regex

I/O Contract

Inputs

Name Type Required Description
patterns list[str] Yes List of regular expression patterns to match against prompts.
is_blocked bool No If True, matching patterns cause the prompt to be flagged as invalid; if False, prompts must match to be valid. Defaults to True.
match_type MatchType or str No Matching strategy: "search" (match anywhere), "full_match" (entire text must match), "all" (all patterns must match). Defaults to MatchType.ALL.
redact bool No Whether to replace matched patterns with "[REDACTED]". Defaults to True.

scan() Inputs

Name Type Required Description
prompt str Yes The input text to match against the defined regex patterns.

Outputs

Name Type Description
prompt str The prompt with matched patterns redacted (if redact=True), or the original prompt.
is_valid bool True if the prompt passes the regex check (based on is_blocked setting); False otherwise.
risk_score float 1.0 if patterns matched (and is_blocked=True); 0.0 otherwise.

Usage Examples

Block Email Addresses

from llm_guard.input_scanners import Regex

scanner = Regex(
    patterns=[r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"],
    is_blocked=True,
    redact=True,
)
prompt = "Send the report to john.doe@example.com please"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(sanitized_prompt)  # "Send the report to [REDACTED] please"
print(is_valid)          # False (email pattern matched)
print(risk_score)        # 1.0

Require Specific Format

from llm_guard.input_scanners import Regex

# Require prompts to contain a ticket number
scanner = Regex(
    patterns=[r"TICKET-\d{4,6}"],
    is_blocked=False,
    match_type="search",
)
prompt = "Please look into TICKET-12345"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # True (required pattern found)

Block SQL Injection Patterns

from llm_guard.input_scanners import Regex

scanner = Regex(
    patterns=[
        r"(?i)\b(SELECT|INSERT|UPDATE|DELETE|DROP|UNION)\b.*\b(FROM|INTO|SET|TABLE|ALL)\b",
        r"(?i);\s*--",
        r"(?i)'\s*(OR|AND)\s+'",
    ],
    is_blocked=True,
    match_type="search",
    redact=False,
)
prompt = "'; DROP TABLE users; --"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # False (SQL injection pattern detected)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment