Overview
The EmotionDetection scanner detects emotions in text using a RoBERTa GoEmotions model and blocks prompts containing configurable negative emotions.
Description
EmotionDetection is an input scanner that analyzes the emotional content of prompts using the SamLowe/roberta-base-go_emotions model, which is trained on Google's GoEmotions dataset. The model classifies text across 28 emotion labels including admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise, and neutral. By default, the scanner blocks 11 negative emotions: anger, annoyance, disappointment, disapproval, disgust, embarrassment, fear, grief, nervousness, remorse, and sadness. The blocked_emotions list is fully customizable. The scanner supports both FULL text matching and sentence-level analysis via the match_type parameter. An optional return_full_output mode provides detailed emotion scores through the scan_with_full_output method.
Usage
Use the EmotionDetection scanner when you need to filter prompts based on emotional tone. This is useful for customer-facing chatbots where you want to detect and handle negative emotions proactively, for mental health applications requiring emotional awareness, or for content moderation to prevent hostile or distressed interactions.
Code Reference
Source Location
Signature
class EmotionDetection(Scanner):
def __init__(
self,
*,
model: Model | None = None, # default: SamLowe/roberta-base-go_emotions
threshold: float = 0.5,
blocked_emotions: List[str] | None = None, # default: 11 negative emotions
match_type: MatchType | str = MatchType.FULL,
use_onnx: bool = False,
return_full_output: bool = False,
) -> None: ...
def scan(self, prompt: str) -> tuple[str, bool, float]: ...
def get_emotion_analysis(self, prompt: str) -> Dict[str, float]: ...
def scan_with_full_output(self, prompt: str) -> tuple[str, bool, float, Dict[str, float]]: ...
Import
from llm_guard.input_scanners.emotion_detection import EmotionDetection
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| model |
Model or None |
No |
The emotion classification model. Defaults to SamLowe/roberta-base-go_emotions.
|
| threshold |
float |
No |
Minimum confidence score for an emotion to be considered detected. Defaults to 0.5.
|
| blocked_emotions |
List[str] or None |
No |
List of emotion labels to block. Defaults to 11 negative emotions: anger, annoyance, disappointment, disapproval, disgust, embarrassment, fear, grief, nervousness, remorse, sadness.
|
| match_type |
MatchType or str |
No |
Whether to analyze the full text or individual sentences. Defaults to MatchType.FULL.
|
| use_onnx |
bool |
No |
Whether to use ONNX runtime for inference. Defaults to False.
|
| return_full_output |
bool |
No |
Whether scan_with_full_output returns detailed emotion scores. Defaults to False.
|
scan() Inputs
| Name |
Type |
Required |
Description
|
| prompt |
str |
Yes |
The input text to analyze for emotional content.
|
Outputs
| Name |
Type |
Description
|
| prompt |
str |
The original prompt (unchanged).
|
| is_valid |
bool |
True if no blocked emotions were detected above the threshold; False otherwise.
|
| risk_score |
float |
The highest confidence score among detected blocked emotions.
|
scan_with_full_output() Additional Output
| Name |
Type |
Description
|
| emotion_scores |
Dict[str, float] |
Dictionary mapping all 28 emotion labels to their confidence scores.
|
Emotion Labels
The model classifies text across 28 emotions:
| Positive |
Negative |
Neutral/Ambiguous
|
| admiration, amusement, approval, caring, curiosity, desire, excitement, gratitude, joy, love, optimism, pride, relief |
anger, annoyance, disappointment, disapproval, disgust, embarrassment, fear, grief, nervousness, remorse, sadness |
confusion, realization, surprise, neutral
|
Usage Examples
Basic Usage
from llm_guard.input_scanners.emotion_detection import EmotionDetection
scanner = EmotionDetection()
prompt = "I am absolutely furious about this terrible service!"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)
print(is_valid) # False (anger detected)
print(risk_score) # Confidence score for the detected emotion
Custom Blocked Emotions
from llm_guard.input_scanners.emotion_detection import EmotionDetection
# Only block specific emotions
scanner = EmotionDetection(
blocked_emotions=["anger", "disgust", "fear"],
threshold=0.6,
)
prompt = "I'm a bit disappointed but overall okay"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)
print(is_valid) # True (disappointment not in blocked list)
Full Emotion Analysis
from llm_guard.input_scanners.emotion_detection import EmotionDetection
scanner = EmotionDetection(return_full_output=True)
prompt = "This is amazing and I love it!"
# Get detailed emotion scores
emotion_scores = scanner.get_emotion_analysis(prompt)
for emotion, score in sorted(emotion_scores.items(), key=lambda x: x[1], reverse=True):
if score > 0.1:
print(f"{emotion}: {score:.3f}")
# Or use scan_with_full_output
sanitized_prompt, is_valid, risk_score, scores = scanner.scan_with_full_output(prompt)
Related Pages