Implementation:Protectai Llm guard Input EmotionDetection

Knowledge Sources	Protectai_Llm_guard
Domains	Emotion_Detection, NLP
Last Updated	2026-02-14 12:00 GMT

Overview

The EmotionDetection scanner detects emotions in text using a RoBERTa GoEmotions model and blocks prompts containing configurable negative emotions.

Description

EmotionDetection is an input scanner that analyzes the emotional content of prompts using the SamLowe/roberta-base-go_emotions model, which is trained on Google's GoEmotions dataset. The model classifies text across 28 emotion labels including admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise, and neutral. By default, the scanner blocks 11 negative emotions: anger, annoyance, disappointment, disapproval, disgust, embarrassment, fear, grief, nervousness, remorse, and sadness. The blocked_emotions list is fully customizable. The scanner supports both FULL text matching and sentence-level analysis via the match_type parameter. An optional return_full_output mode provides detailed emotion scores through the scan_with_full_output method.

Usage

Use the EmotionDetection scanner when you need to filter prompts based on emotional tone. This is useful for customer-facing chatbots where you want to detect and handle negative emotions proactively, for mental health applications requiring emotional awareness, or for content moderation to prevent hostile or distressed interactions.

Code Reference

Source Location

Repository: Protectai_Llm_guard
File: llm_guard/input_scanners/emotion_detection.py
Lines: 1-288

Signature

class EmotionDetection(Scanner):
    def __init__(
        self,
        *,
        model: Model | None = None,              # default: SamLowe/roberta-base-go_emotions
        threshold: float = 0.5,
        blocked_emotions: List[str] | None = None,  # default: 11 negative emotions
        match_type: MatchType | str = MatchType.FULL,
        use_onnx: bool = False,
        return_full_output: bool = False,
    ) -> None: ...

    def scan(self, prompt: str) -> tuple[str, bool, float]: ...

    def get_emotion_analysis(self, prompt: str) -> Dict[str, float]: ...

    def scan_with_full_output(self, prompt: str) -> tuple[str, bool, float, Dict[str, float]]: ...

Import

from llm_guard.input_scanners.emotion_detection import EmotionDetection

I/O Contract

Inputs

Name	Type	Required	Description
model	Model or None	No	The emotion classification model. Defaults to SamLowe/roberta-base-go_emotions.
threshold	float	No	Minimum confidence score for an emotion to be considered detected. Defaults to 0.5.
blocked_emotions	List[str] or None	No	List of emotion labels to block. Defaults to 11 negative emotions: anger, annoyance, disappointment, disapproval, disgust, embarrassment, fear, grief, nervousness, remorse, sadness.
match_type	MatchType or str	No	Whether to analyze the full text or individual sentences. Defaults to MatchType.FULL.
use_onnx	bool	No	Whether to use ONNX runtime for inference. Defaults to False.
return_full_output	bool	No	Whether scan_with_full_output returns detailed emotion scores. Defaults to False.

scan() Inputs

Name	Type	Required	Description
prompt	str	Yes	The input text to analyze for emotional content.

Outputs

Name	Type	Description
prompt	str	The original prompt (unchanged).
is_valid	bool	True if no blocked emotions were detected above the threshold; False otherwise.
risk_score	float	The highest confidence score among detected blocked emotions.

scan_with_full_output() Additional Output

Name	Type	Description
emotion_scores	Dict[str, float]	Dictionary mapping all 28 emotion labels to their confidence scores.

Emotion Labels

The model classifies text across 28 emotions:

Positive	Negative	Neutral/Ambiguous
admiration, amusement, approval, caring, curiosity, desire, excitement, gratitude, joy, love, optimism, pride, relief	anger, annoyance, disappointment, disapproval, disgust, embarrassment, fear, grief, nervousness, remorse, sadness	confusion, realization, surprise, neutral

Usage Examples

Basic Usage

from llm_guard.input_scanners.emotion_detection import EmotionDetection

scanner = EmotionDetection()
prompt = "I am absolutely furious about this terrible service!"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # False (anger detected)
print(risk_score)  # Confidence score for the detected emotion

Custom Blocked Emotions

from llm_guard.input_scanners.emotion_detection import EmotionDetection

# Only block specific emotions
scanner = EmotionDetection(
    blocked_emotions=["anger", "disgust", "fear"],
    threshold=0.6,
)
prompt = "I'm a bit disappointed but overall okay"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # True (disappointment not in blocked list)

Full Emotion Analysis

from llm_guard.input_scanners.emotion_detection import EmotionDetection

scanner = EmotionDetection(return_full_output=True)
prompt = "This is amazing and I love it!"

# Get detailed emotion scores
emotion_scores = scanner.get_emotion_analysis(prompt)
for emotion, score in sorted(emotion_scores.items(), key=lambda x: x[1], reverse=True):
    if score > 0.1:
        print(f"{emotion}: {score:.3f}")

# Or use scan_with_full_output
sanitized_prompt, is_valid, risk_score, scores = scanner.scan_with_full_output(prompt)

Related Pages

Principle:Protectai_Llm_guard_Emotion_Detection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment