Implementation:Evidentlyai Evidently LLM Judge Descriptors
| Knowledge Sources | |
|---|---|
| Domains | LLM_Evaluation, NLP, AI_Safety |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Concrete LLM-as-judge descriptor classes for evaluating text quality via external LLM APIs provided by the Evidently library.
Description
Evidently provides built-in LLM judge descriptors that evaluate text using external LLM APIs:
- NegativityLLMEval: Detects negativity, hostility, or toxic sentiment
- DeclineLLMEval: Detects when an LLM refuses or declines to answer
These descriptors send each text row to an LLM API (OpenAI by default) with a structured prompt and parse the response. They can optionally return category labels, numerical scores, and reasoning.
Usage
Import LLM judge descriptors and pass them to Dataset.from_pandas(descriptors=[...]). Requires an LLM provider API key (e.g., OPENAI_API_KEY).
Code Reference
Source Location
- Repository: evidently
- File: src/evidently/descriptors/generated_descriptors.py
- Lines: L1030-1080 (DeclineLLMEval), L1095-1133 (NegativityLLMEval)
- File: src/evidently/legacy/descriptors/llm_judges.py
- Lines: L92-114 (NegativityLLMEval legacy), L141-160 (DeclineLLMEval legacy)
Signature
class NegativityLLMEval(Descriptor):
def __init__(
self,
column_name: str,
provider: str = "openai",
model: str = "gpt-4o-mini",
additional_columns: Optional[Dict[str, str]] = None,
include_category: Optional[bool] = None,
include_score: Optional[bool] = None,
include_reasoning: Optional[bool] = None,
uncertainty: Optional[Uncertainty] = None,
alias: Optional[str] = None,
tests: Optional[List] = None,
):
"""Evaluate text for negativity using an LLM judge."""
class DeclineLLMEval(Descriptor):
def __init__(
self,
column_name: str,
provider: str = "openai",
model: str = "gpt-4o-mini",
additional_columns: Optional[Dict[str, str]] = None,
include_category: Optional[bool] = None,
include_score: Optional[bool] = None,
include_reasoning: Optional[bool] = None,
uncertainty: Optional[Uncertainty] = None,
alias: Optional[str] = None,
tests: Optional[List] = None,
):
"""Evaluate text for decline/refusal patterns using an LLM judge."""
Import
from evidently.descriptors import NegativityLLMEval, DeclineLLMEval
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| column_name | str | Yes | Text column to evaluate |
| provider | str | No | LLM provider (default: "openai") |
| model | str | No | LLM model name (default: "gpt-4o-mini") |
| additional_columns | Optional[Dict[str, str]] | No | Additional context columns |
| include_category | Optional[bool] | No | Include category label in output |
| include_score | Optional[bool] | No | Include numerical score in output |
| include_reasoning | Optional[bool] | No | Include reasoning text in output |
| alias | Optional[str] | No | Output column name alias |
Outputs
| Descriptor | Output Type | Description |
|---|---|---|
| NegativityLLMEval | Categorical | Negativity category (e.g., "negative", "neutral", "positive") |
| DeclineLLMEval | Categorical | Decline category (e.g., "decline", "no decline") |
Usage Examples
LLM Evaluation Monitoring
from evidently import Dataset, DataDefinition
from evidently.descriptors import NegativityLLMEval, DeclineLLMEval, Sentiment
# Evaluate LLM chatbot responses
dataset = Dataset.from_pandas(
df,
data_definition=DataDefinition(),
descriptors=[
Sentiment("response"),
NegativityLLMEval("response", provider="openai", model="gpt-4o-mini"),
DeclineLLMEval("response", provider="openai", model="gpt-4o-mini"),
],
)
# Access computed evaluation columns
eval_df = dataset.as_dataframe()
print(eval_df[["response", "Sentiment", "Negativity", "Decline"]].head())