Overview
The BanTopics scanner blocks prompts about banned topics using zero-shot text classification models.
Description
BanTopics is an input scanner that uses zero-shot classification to determine whether a prompt relates to any of a user-specified list of banned topics. It leverages NLI-based (Natural Language Inference) models that can classify text against arbitrary labels without task-specific fine-tuning. The default model is MoritzLaurer/roberta-base-zeroshot-v2.0-c, but multiple alternative models are available including deberta-large, deberta-base, bge-m3, roberta-large-c, and roberta-base-c variants for different accuracy and performance trade-offs. The threshold parameter (default 0.6) controls the minimum classification confidence required to flag a prompt. ONNX runtime support is available for optimized inference.
Usage
Use the BanTopics scanner when you need to prevent prompts about specific subject areas such as violence, politics, religion, adult content, or any custom topic. This is ideal for content moderation where you want flexible, configurable topic restrictions without training a custom classifier.
Code Reference
Source Location
Signature
class BanTopics(Scanner):
def __init__(
self,
topics: list[str],
*,
threshold: float = 0.6,
model: Model | None = None, # default: MoritzLaurer/roberta-base-zeroshot-v2.0-c
use_onnx: bool = False,
) -> None: ...
def scan(self, prompt: str) -> tuple[str, bool, float]: ...
Import
from llm_guard.input_scanners import BanTopics
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| topics |
list[str] |
Yes |
List of topic labels to ban (e.g., "violence", "politics", "adult content").
|
| threshold |
float |
No |
Minimum classification confidence to flag a topic match. Defaults to 0.6.
|
| model |
Model or None |
No |
The zero-shot classification model to use. Defaults to MoritzLaurer/roberta-base-zeroshot-v2.0-c.
|
| use_onnx |
bool |
No |
Whether to use ONNX runtime for inference. Defaults to False.
|
scan() Inputs
| Name |
Type |
Required |
Description
|
| prompt |
str |
Yes |
The input text to classify against banned topics.
|
Outputs
| Name |
Type |
Description
|
| prompt |
str |
The original prompt (unchanged).
|
| is_valid |
bool |
True if the prompt does not match any banned topic above the threshold; False otherwise.
|
| risk_score |
float |
The highest classification confidence score across all banned topics.
|
Available Models
| Model |
Description
|
| MoritzLaurer/roberta-base-zeroshot-v2.0-c |
Default model, RoBERTa base with zero-shot capability.
|
| deberta-large variant |
Larger DeBERTa model for higher accuracy.
|
| deberta-base variant |
Base DeBERTa model, balanced accuracy and speed.
|
| bge-m3 variant |
Multilingual model for cross-language topic detection.
|
| roberta-large-c variant |
Larger RoBERTa model for improved classification.
|
| roberta-base-c variant |
Base RoBERTa model, fastest inference.
|
Usage Examples
Basic Usage
from llm_guard.input_scanners import BanTopics
scanner = BanTopics(
topics=["violence", "politics", "religion"],
threshold=0.6,
)
prompt = "Tell me about the latest election results"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)
print(is_valid) # False (politics topic detected)
print(risk_score) # Classification confidence
Custom Model and Threshold
from llm_guard.input_scanners import BanTopics
scanner = BanTopics(
topics=["adult content", "gambling", "drugs"],
threshold=0.5,
use_onnx=True,
)
prompt = "What is the weather like today?"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)
print(is_valid) # True (no banned topic matched)
print(risk_score) # Low confidence score
Related Pages