Principle:Protectai Llm guard Substring Filtering
| Knowledge Sources | |
|---|---|
| Domains | Content_Filtering, Security |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Blocking text containing specific substrings using exact string matching or word-boundary matching.
Description
Substring Filtering is a content filtering principle that enforces text policies by scanning for the presence of forbidden substrings. It provides a lightweight, deterministic, and highly interpretable mechanism for blocking undesirable content without requiring machine learning models or external services.
The principle supports two matching strategies. The first, string containment (referred to as the "str" mode), performs simple substring search to detect if the forbidden text appears anywhere within the input, regardless of word boundaries. The second, word-boundary matching (the "word" mode), uses regular expression word-boundary anchors to ensure that only whole-word occurrences are matched, preventing false positives from partial matches within longer words.
Additional configuration options include case-insensitive matching for catching variations in capitalization, contains-all versus contains-any logic for controlling whether all substrings must be present or just one, and optional redaction to replace matched substrings with sanitized placeholders rather than rejecting the entire text.
Usage
Use this principle for straightforward blocklist enforcement where the forbidden terms are known in advance and can be expressed as literal strings. Common applications include profanity filtering, blocking specific product names or URLs, preventing disclosure of internal codenames, and enforcing compliance with content policies that prohibit certain phrases. This approach is ideal when speed and simplicity are more important than semantic understanding, and when the set of blocked terms is well-defined and stable.
Theoretical Basis
The matching algorithm operates as follows:
Strategy 1: String Containment ("str" mode)
- For each banned substring, check if it appears in the input text
- Uses native string containment (the "in" operator)
- Optionally convert both input and substring to lowercase for case-insensitive matching
Strategy 2: Word Boundary ("word" mode)
- For each banned substring, construct a regex pattern:
\b{substring}\b - Apply the regex against the input text
- Word boundaries ensure that "ban" does not match "banana"
Aggregation Logic:
- In "contains-any" mode, flag the text if any banned substring is found
- In "contains-all" mode, flag the text only if every banned substring is found
Redaction (optional):
- Replace each matched substring with a configurable placeholder string
- Return the sanitized text instead of rejecting it outright