Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Protectai Llm guard Substring Filtering

From Leeroopedia
Knowledge Sources
Domains Content_Filtering, Security
Last Updated 2026-02-14 12:00 GMT

Overview

Blocking text containing specific substrings using exact string matching or word-boundary matching.

Description

Substring Filtering is a content filtering principle that enforces text policies by scanning for the presence of forbidden substrings. It provides a lightweight, deterministic, and highly interpretable mechanism for blocking undesirable content without requiring machine learning models or external services.

The principle supports two matching strategies. The first, string containment (referred to as the "str" mode), performs simple substring search to detect if the forbidden text appears anywhere within the input, regardless of word boundaries. The second, word-boundary matching (the "word" mode), uses regular expression word-boundary anchors to ensure that only whole-word occurrences are matched, preventing false positives from partial matches within longer words.

Additional configuration options include case-insensitive matching for catching variations in capitalization, contains-all versus contains-any logic for controlling whether all substrings must be present or just one, and optional redaction to replace matched substrings with sanitized placeholders rather than rejecting the entire text.

Usage

Use this principle for straightforward blocklist enforcement where the forbidden terms are known in advance and can be expressed as literal strings. Common applications include profanity filtering, blocking specific product names or URLs, preventing disclosure of internal codenames, and enforcing compliance with content policies that prohibit certain phrases. This approach is ideal when speed and simplicity are more important than semantic understanding, and when the set of blocked terms is well-defined and stable.

Theoretical Basis

The matching algorithm operates as follows:

Strategy 1: String Containment ("str" mode)

  • For each banned substring, check if it appears in the input text
  • Uses native string containment (the "in" operator)
  • Optionally convert both input and substring to lowercase for case-insensitive matching

Strategy 2: Word Boundary ("word" mode)

  • For each banned substring, construct a regex pattern: \b{substring}\b
  • Apply the regex against the input text
  • Word boundaries ensure that "ban" does not match "banana"

Aggregation Logic:

  • In "contains-any" mode, flag the text if any banned substring is found
  • In "contains-all" mode, flag the text only if every banned substring is found

Redaction (optional):

  • Replace each matched substring with a configurable placeholder string
  • Return the sanitized text instead of rejecting it outright

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment