Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Evidentlyai Evidently LLM Judge Descriptors

From Leeroopedia
Knowledge Sources
Domains LLM_Evaluation, NLP, AI_Safety
Last Updated 2026-02-14 12:00 GMT

Overview

Concrete LLM-as-judge descriptor classes for evaluating text quality via external LLM APIs provided by the Evidently library.

Description

Evidently provides built-in LLM judge descriptors that evaluate text using external LLM APIs:

  • NegativityLLMEval: Detects negativity, hostility, or toxic sentiment
  • DeclineLLMEval: Detects when an LLM refuses or declines to answer

These descriptors send each text row to an LLM API (OpenAI by default) with a structured prompt and parse the response. They can optionally return category labels, numerical scores, and reasoning.

Usage

Import LLM judge descriptors and pass them to Dataset.from_pandas(descriptors=[...]). Requires an LLM provider API key (e.g., OPENAI_API_KEY).

Code Reference

Source Location

  • Repository: evidently
  • File: src/evidently/descriptors/generated_descriptors.py
  • Lines: L1030-1080 (DeclineLLMEval), L1095-1133 (NegativityLLMEval)
  • File: src/evidently/legacy/descriptors/llm_judges.py
  • Lines: L92-114 (NegativityLLMEval legacy), L141-160 (DeclineLLMEval legacy)

Signature

class NegativityLLMEval(Descriptor):
    def __init__(
        self,
        column_name: str,
        provider: str = "openai",
        model: str = "gpt-4o-mini",
        additional_columns: Optional[Dict[str, str]] = None,
        include_category: Optional[bool] = None,
        include_score: Optional[bool] = None,
        include_reasoning: Optional[bool] = None,
        uncertainty: Optional[Uncertainty] = None,
        alias: Optional[str] = None,
        tests: Optional[List] = None,
    ):
        """Evaluate text for negativity using an LLM judge."""

class DeclineLLMEval(Descriptor):
    def __init__(
        self,
        column_name: str,
        provider: str = "openai",
        model: str = "gpt-4o-mini",
        additional_columns: Optional[Dict[str, str]] = None,
        include_category: Optional[bool] = None,
        include_score: Optional[bool] = None,
        include_reasoning: Optional[bool] = None,
        uncertainty: Optional[Uncertainty] = None,
        alias: Optional[str] = None,
        tests: Optional[List] = None,
    ):
        """Evaluate text for decline/refusal patterns using an LLM judge."""

Import

from evidently.descriptors import NegativityLLMEval, DeclineLLMEval

I/O Contract

Inputs

Name Type Required Description
column_name str Yes Text column to evaluate
provider str No LLM provider (default: "openai")
model str No LLM model name (default: "gpt-4o-mini")
additional_columns Optional[Dict[str, str]] No Additional context columns
include_category Optional[bool] No Include category label in output
include_score Optional[bool] No Include numerical score in output
include_reasoning Optional[bool] No Include reasoning text in output
alias Optional[str] No Output column name alias

Outputs

Descriptor Output Type Description
NegativityLLMEval Categorical Negativity category (e.g., "negative", "neutral", "positive")
DeclineLLMEval Categorical Decline category (e.g., "decline", "no decline")

Usage Examples

LLM Evaluation Monitoring

from evidently import Dataset, DataDefinition
from evidently.descriptors import NegativityLLMEval, DeclineLLMEval, Sentiment

# Evaluate LLM chatbot responses
dataset = Dataset.from_pandas(
    df,
    data_definition=DataDefinition(),
    descriptors=[
        Sentiment("response"),
        NegativityLLMEval("response", provider="openai", model="gpt-4o-mini"),
        DeclineLLMEval("response", provider="openai", model="gpt-4o-mini"),
    ],
)

# Access computed evaluation columns
eval_df = dataset.as_dataframe()
print(eval_df[["response", "Sentiment", "Negativity", "Decline"]].head())

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment