Overview
BanTopics is an output scanner that detects and blocks LLM responses covering banned topics using zero-shot classification, delegating to the input-side InputBanTopics scanner.
Description
The BanTopics output scanner is a thin wrapper around the corresponding input scanner InputBanTopics. It uses a zero-shot classification model to determine whether the LLM output falls under any of the specified banned topics. The scanner accepts a list of topic labels and a threshold parameter that controls the minimum confidence score required for a topic to be considered a match. When the model classifies the output as belonging to a banned topic with confidence above the threshold, the output is flagged as invalid. This approach does not require topic-specific training data, making it flexible for a wide range of content policies.
Usage
Use this scanner when you need to enforce topic-based content policies on LLM outputs. Common scenarios include preventing discussions of violence, politics, religion, adult content, or any other topics that violate your application's content guidelines. The zero-shot approach allows you to add new banned topics without retraining any models.
Code Reference
Source Location
Signature
class BanTopics(Scanner):
def __init__(
self,
topics: list[str],
*,
threshold: float = 0.75,
model: Model | None = None,
use_onnx: bool = False,
) -> None: ...
def scan(self, prompt: str, output: str) -> tuple[str, bool, float]: ...
Import
from llm_guard.output_scanners import BanTopics
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| prompt |
str |
Yes |
The input prompt
|
| output |
str |
Yes |
The LLM output to scan for banned topics
|
Constructor Parameters
| Name |
Type |
Required |
Default |
Description
|
| topics |
list[str] |
Yes |
N/A |
List of topic labels to ban
|
| threshold |
float |
No |
0.75 |
Minimum classification confidence to trigger detection
|
| model |
None |
No |
None |
Custom zero-shot classification model
|
| use_onnx |
bool |
No |
False |
Whether to use ONNX runtime for inference
|
Outputs
| Name |
Type |
Description
|
| sanitized_output |
str |
The output (potentially modified)
|
| is_valid |
bool |
Whether the output passed the scan (True if no banned topics detected)
|
| risk_score |
float |
Risk score (-1.0 to 1.0)
|
Usage Examples
Basic Usage
from llm_guard.output_scanners import BanTopics
scanner = BanTopics(
topics=["violence", "politics", "religion"],
threshold=0.75,
)
prompt = "Tell me about world history"
output = "The French Revolution was a period of major political upheaval in France."
sanitized_output, is_valid, risk_score = scanner.scan(prompt, output)
if not is_valid:
print(f"Banned topic detected (risk: {risk_score})")
else:
print("Output topic is acceptable")
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.