Principle:Protectai Llm guard Sentiment Analysis
| Knowledge Sources | |
|---|---|
| Domains | Sentiment_Analysis, NLP |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Measuring text sentiment polarity to detect and block negative emotional content.
Description
Sentiment analysis in the context of LLM guardrails involves evaluating the emotional valence of both input prompts and generated outputs. The goal is to identify text that carries strongly negative sentiment -- such as hostility, despair, or aggression -- and either flag or block it before it reaches the end user.
This principle relies on lexicon-based sentiment scoring, specifically the VADER (Valence Aware Dictionary and sEntiment Reasoner) approach. VADER maintains a curated dictionary of words, each annotated with a human-rated valence score. These individual word scores are then combined using a set of grammatical heuristics that account for:
- Negation -- reversing the polarity when negation words are present (e.g., "not good" shifts positive to negative).
- Intensification -- amplifying or dampening scores based on degree modifiers (e.g., "very", "slightly").
- Conjunctions -- handling contrastive conjunctions like "but" that shift emphasis toward the latter clause.
The final output is a compound score ranging from -1 (most negative) to +1 (most positive). A configurable threshold determines the cutoff below which text is considered unacceptably negative.
Usage
Apply this principle when you need to enforce a minimum emotional tone in user-facing text. It is particularly useful for:
- Preventing LLM outputs that contain hostile, abusive, or deeply pessimistic language.
- Screening user inputs for aggressive or harmful sentiment before they reach the model.
- Maintaining brand-safe communication standards in customer-facing applications.
Theoretical Basis
The VADER algorithm computes sentiment as follows:
1. Tokenize the input text into individual words and punctuation. 2. Look up each token in the sentiment lexicon to obtain its base valence score. 3. Apply grammatical heuristics: a. Check for negation words in the preceding context; if found, multiply valence by a negation constant. b. Check for booster/dampener words; adjust valence by the intensification increment. c. Handle conjunctions by weighting clause-level sentiment toward the final clause. 4. Sum all adjusted valence scores. 5. Normalize the sum into the range [-1, +1] using the formula: compound = sum / sqrt(sum^2 + alpha) where alpha is a normalization constant. 6. Compare the compound score against the configured threshold to produce a pass/fail decision.