Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Protectai Llm guard Input BanTopics

From Leeroopedia
Knowledge Sources
Domains Content_Filtering, Zero_Shot_Classification
Last Updated 2026-02-14 12:00 GMT

Overview

The BanTopics scanner blocks prompts about banned topics using zero-shot text classification models.

Description

BanTopics is an input scanner that uses zero-shot classification to determine whether a prompt relates to any of a user-specified list of banned topics. It leverages NLI-based (Natural Language Inference) models that can classify text against arbitrary labels without task-specific fine-tuning. The default model is MoritzLaurer/roberta-base-zeroshot-v2.0-c, but multiple alternative models are available including deberta-large, deberta-base, bge-m3, roberta-large-c, and roberta-base-c variants for different accuracy and performance trade-offs. The threshold parameter (default 0.6) controls the minimum classification confidence required to flag a prompt. ONNX runtime support is available for optimized inference.

Usage

Use the BanTopics scanner when you need to prevent prompts about specific subject areas such as violence, politics, religion, adult content, or any custom topic. This is ideal for content moderation where you want flexible, configurable topic restrictions without training a custom classifier.

Code Reference

Source Location

Signature

class BanTopics(Scanner):
    def __init__(
        self,
        topics: list[str],
        *,
        threshold: float = 0.6,
        model: Model | None = None,  # default: MoritzLaurer/roberta-base-zeroshot-v2.0-c
        use_onnx: bool = False,
    ) -> None: ...

    def scan(self, prompt: str) -> tuple[str, bool, float]: ...

Import

from llm_guard.input_scanners import BanTopics

I/O Contract

Inputs

Name Type Required Description
topics list[str] Yes List of topic labels to ban (e.g., "violence", "politics", "adult content").
threshold float No Minimum classification confidence to flag a topic match. Defaults to 0.6.
model Model or None No The zero-shot classification model to use. Defaults to MoritzLaurer/roberta-base-zeroshot-v2.0-c.
use_onnx bool No Whether to use ONNX runtime for inference. Defaults to False.

scan() Inputs

Name Type Required Description
prompt str Yes The input text to classify against banned topics.

Outputs

Name Type Description
prompt str The original prompt (unchanged).
is_valid bool True if the prompt does not match any banned topic above the threshold; False otherwise.
risk_score float The highest classification confidence score across all banned topics.

Available Models

Model Description
MoritzLaurer/roberta-base-zeroshot-v2.0-c Default model, RoBERTa base with zero-shot capability.
deberta-large variant Larger DeBERTa model for higher accuracy.
deberta-base variant Base DeBERTa model, balanced accuracy and speed.
bge-m3 variant Multilingual model for cross-language topic detection.
roberta-large-c variant Larger RoBERTa model for improved classification.
roberta-base-c variant Base RoBERTa model, fastest inference.

Usage Examples

Basic Usage

from llm_guard.input_scanners import BanTopics

scanner = BanTopics(
    topics=["violence", "politics", "religion"],
    threshold=0.6,
)
prompt = "Tell me about the latest election results"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # False (politics topic detected)
print(risk_score)  # Classification confidence

Custom Model and Threshold

from llm_guard.input_scanners import BanTopics

scanner = BanTopics(
    topics=["adult content", "gambling", "drugs"],
    threshold=0.5,
    use_onnx=True,
)
prompt = "What is the weather like today?"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # True (no banned topic matched)
print(risk_score)  # Low confidence score

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment