Implementation:Protectai Llm guard Input BanTopics

Knowledge Sources	Protectai_Llm_guard
Domains	Content_Filtering, Zero_Shot_Classification
Last Updated	2026-02-14 12:00 GMT

Overview

The BanTopics scanner blocks prompts about banned topics using zero-shot text classification models.

Description

BanTopics is an input scanner that uses zero-shot classification to determine whether a prompt relates to any of a user-specified list of banned topics. It leverages NLI-based (Natural Language Inference) models that can classify text against arbitrary labels without task-specific fine-tuning. The default model is MoritzLaurer/roberta-base-zeroshot-v2.0-c, but multiple alternative models are available including deberta-large, deberta-base, bge-m3, roberta-large-c, and roberta-base-c variants for different accuracy and performance trade-offs. The threshold parameter (default 0.6) controls the minimum classification confidence required to flag a prompt. ONNX runtime support is available for optimized inference.

Usage

Use the BanTopics scanner when you need to prevent prompts about specific subject areas such as violence, politics, religion, adult content, or any custom topic. This is ideal for content moderation where you want flexible, configurable topic restrictions without training a custom classifier.

Code Reference

Source Location

Repository: Protectai_Llm_guard
File: llm_guard/input_scanners/ban_topics.py
Lines: 1-159

Signature

class BanTopics(Scanner):
    def __init__(
        self,
        topics: list[str],
        *,
        threshold: float = 0.6,
        model: Model | None = None,  # default: MoritzLaurer/roberta-base-zeroshot-v2.0-c
        use_onnx: bool = False,
    ) -> None: ...

    def scan(self, prompt: str) -> tuple[str, bool, float]: ...

Import

from llm_guard.input_scanners import BanTopics

I/O Contract

Inputs

Name	Type	Required	Description
topics	list[str]	Yes	List of topic labels to ban (e.g., "violence", "politics", "adult content").
threshold	float	No	Minimum classification confidence to flag a topic match. Defaults to 0.6.
model	Model or None	No	The zero-shot classification model to use. Defaults to MoritzLaurer/roberta-base-zeroshot-v2.0-c.
use_onnx	bool	No	Whether to use ONNX runtime for inference. Defaults to False.

scan() Inputs

Name	Type	Required	Description
prompt	str	Yes	The input text to classify against banned topics.

Outputs

Name	Type	Description
prompt	str	The original prompt (unchanged).
is_valid	bool	True if the prompt does not match any banned topic above the threshold; False otherwise.
risk_score	float	The highest classification confidence score across all banned topics.

Available Models

Model	Description
MoritzLaurer/roberta-base-zeroshot-v2.0-c	Default model, RoBERTa base with zero-shot capability.
deberta-large variant	Larger DeBERTa model for higher accuracy.
deberta-base variant	Base DeBERTa model, balanced accuracy and speed.
bge-m3 variant	Multilingual model for cross-language topic detection.
roberta-large-c variant	Larger RoBERTa model for improved classification.
roberta-base-c variant	Base RoBERTa model, fastest inference.

Usage Examples

Basic Usage

from llm_guard.input_scanners import BanTopics

scanner = BanTopics(
    topics=["violence", "politics", "religion"],
    threshold=0.6,
)
prompt = "Tell me about the latest election results"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # False (politics topic detected)
print(risk_score)  # Classification confidence

Custom Model and Threshold

from llm_guard.input_scanners import BanTopics

scanner = BanTopics(
    topics=["adult content", "gambling", "drugs"],
    threshold=0.5,
    use_onnx=True,
)
prompt = "What is the weather like today?"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # True (no banned topic matched)
print(risk_score)  # Low confidence score

Related Pages

Principle:Protectai_Llm_guard_Topic_Filtering

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment