Implementation:Evidentlyai Evidently LLM Judge Descriptors

Knowledge Sources	Evidently Evidently LLM Evaluation
Domains	LLM_Evaluation, NLP, AI_Safety
Last Updated	2026-02-14 12:00 GMT

Overview

Concrete LLM-as-judge descriptor classes for evaluating text quality via external LLM APIs provided by the Evidently library.

Description

Evidently provides built-in LLM judge descriptors that evaluate text using external LLM APIs:

NegativityLLMEval: Detects negativity, hostility, or toxic sentiment
DeclineLLMEval: Detects when an LLM refuses or declines to answer

These descriptors send each text row to an LLM API (OpenAI by default) with a structured prompt and parse the response. They can optionally return category labels, numerical scores, and reasoning.

Usage

Import LLM judge descriptors and pass them to Dataset.from_pandas(descriptors=[...]). Requires an LLM provider API key (e.g., OPENAI_API_KEY).

Code Reference

Source Location

Repository: evidently
File: src/evidently/descriptors/generated_descriptors.py
Lines: L1030-1080 (DeclineLLMEval), L1095-1133 (NegativityLLMEval)
File: src/evidently/legacy/descriptors/llm_judges.py
Lines: L92-114 (NegativityLLMEval legacy), L141-160 (DeclineLLMEval legacy)

Signature

class NegativityLLMEval(Descriptor):
    def __init__(
        self,
        column_name: str,
        provider: str = "openai",
        model: str = "gpt-4o-mini",
        additional_columns: Optional[Dict[str, str]] = None,
        include_category: Optional[bool] = None,
        include_score: Optional[bool] = None,
        include_reasoning: Optional[bool] = None,
        uncertainty: Optional[Uncertainty] = None,
        alias: Optional[str] = None,
        tests: Optional[List] = None,
    ):
        """Evaluate text for negativity using an LLM judge."""

class DeclineLLMEval(Descriptor):
    def __init__(
        self,
        column_name: str,
        provider: str = "openai",
        model: str = "gpt-4o-mini",
        additional_columns: Optional[Dict[str, str]] = None,
        include_category: Optional[bool] = None,
        include_score: Optional[bool] = None,
        include_reasoning: Optional[bool] = None,
        uncertainty: Optional[Uncertainty] = None,
        alias: Optional[str] = None,
        tests: Optional[List] = None,
    ):
        """Evaluate text for decline/refusal patterns using an LLM judge."""

Import

from evidently.descriptors import NegativityLLMEval, DeclineLLMEval

I/O Contract

Inputs

Name	Type	Required	Description
column_name	str	Yes	Text column to evaluate
provider	str	No	LLM provider (default: "openai")
model	str	No	LLM model name (default: "gpt-4o-mini")
additional_columns	Optional[Dict[str, str]]	No	Additional context columns
include_category	Optional[bool]	No	Include category label in output
include_score	Optional[bool]	No	Include numerical score in output
include_reasoning	Optional[bool]	No	Include reasoning text in output
alias	Optional[str]	No	Output column name alias

Outputs

Descriptor	Output Type	Description
NegativityLLMEval	Categorical	Negativity category (e.g., "negative", "neutral", "positive")
DeclineLLMEval	Categorical	Decline category (e.g., "decline", "no decline")

Usage Examples

LLM Evaluation Monitoring

from evidently import Dataset, DataDefinition
from evidently.descriptors import NegativityLLMEval, DeclineLLMEval, Sentiment

# Evaluate LLM chatbot responses
dataset = Dataset.from_pandas(
    df,
    data_definition=DataDefinition(),
    descriptors=[
        Sentiment("response"),
        NegativityLLMEval("response", provider="openai", model="gpt-4o-mini"),
        DeclineLLMEval("response", provider="openai", model="gpt-4o-mini"),
    ],
)

# Access computed evaluation columns
eval_df = dataset.as_dataframe()
print(eval_df[["response", "Sentiment", "Negativity", "Decline"]].head())

Related Pages

Implements Principle

Principle:Evidentlyai_Evidently_LLM_Judge_Evaluation

Requires Environment

Environment:Evidentlyai_Evidently_LLM_Evaluation_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment