Implementation:Evidentlyai Evidently LLM Templates

Knowledge Sources	Evidentlyai_Evidently
Domains	LLM Evaluation, NLP, Classification
Last Updated	2026-02-14 12:00 GMT

Overview

Defines LLM prompt template classes for binary and multiclass classification evaluation tasks, including the base template interface, uncertainty handling strategies, and structured output generation.

Description

The templates module provides the prompt template infrastructure for Evidently's LLM-based evaluation system. It defines how prompts are constructed, how inputs are mapped from DataFrame columns, and how structured outputs (category, score, reasoning) are extracted from LLM responses.

Core Classes:

BaseLLMPromptTemplate -- Abstract base class extending BlockPromptTemplate. Defines the interface that all LLM evaluation templates must implement:
- iterate_messages(data, input_columns): Generates LLMRequest objects for each row of a DataFrame. Maps DataFrame columns to template placeholders via input_columns dict.
- list_output_columns(): Returns column names the template produces.
- get_type(subcolumn): Returns the ColumnType (Categorical, Numerical, Text) for each output column.
- get_main_output_column(): Returns the primary output column name.
- Marked as is_base_type = True for polymorphic deserialization.

Uncertainty -- Enum defining strategies for handling uncertain classifications:
- UNKNOWN: Use a separate "UNKNOWN" category.
- TARGET: Treat uncertain cases as the target category.
- NON_TARGET: Treat uncertain cases as the non-target category.

BinaryClassificationPromptTemplate -- Template for binary classification tasks. Configuration fields:
- criteria: Classification criteria or instructions text.
- target_category / non_target_category: The two category names.
- uncertainty: How to handle uncertain classifications (default: UNKNOWN).
- include_category (default: True): Whether to output the category classification.
- include_reasoning (default: False): Whether to output reasoning text.
- include_score (default: False): Whether to output confidence scores.
- score_range (default: (0.0, 1.0)): Min/max for score values.
- output_column, output_reasoning_column, output_score_column: Customizable output column names.
- instructions_template: Configurable instruction format with {__categories__} and {__scoring__} placeholders.
- anchor_start / anchor_end: Text markers delineating input text boundaries.
- placeholders: Additional placeholder values for template substitution.
- pre_messages: Additional LLMMessage objects prepended to the prompt.
- Builds prompts using PromptBlock composables: simple text, anchored input, and JSON output specification.

MulticlassClassificationPromptTemplate -- Template for multiclass classification tasks. Configuration fields:
- category_criteria: Dict mapping category names to their criteria descriptions.
- uncertainty: String specifying the uncertainty category (default: "UNKNOWN").
- include_score: When enabled, generates per-category score columns (e.g., "score_cat1", "score_cat2").
- output_score_column_prefix: Prefix for per-category score columns (default: "score").
- All other fields mirror BinaryClassificationPromptTemplate (criteria, include_category, include_reasoning, anchor_start/end, pre_messages, etc.).
- get_score_column(category): Generates per-category score column names.

Output Column Types:

Column	Type
category (output_column)	ColumnType.Categorical
reasoning (output_reasoning_column)	ColumnType.Text
score (output_score_column)	ColumnType.Numerical
score_{category} (multiclass)	ColumnType.Numerical

Usage

Use this module when:

Configuring LLM-based evaluation descriptors (LLMEval, LLMJudge, BiasLLMEval, etc.).
Defining custom binary or multiclass classification tasks for LLM judges.
Building structured LLM evaluation pipelines with category, score, and reasoning outputs.

Code Reference

Source Location

Repository: Evidentlyai_Evidently
File: src/evidently/llm/templates.py

Signature

class BaseLLMPromptTemplate(BlockPromptTemplate):
    def iterate_messages(self, data: pd.DataFrame, input_columns: Dict[str, str]) -> Iterator[LLMRequest[dict]]
    def list_output_columns(self) -> List[str]
    def get_type(self, subcolumn: Optional[str]) -> ColumnType
    def get_main_output_column(self) -> str

class Uncertainty(str, Enum):
    UNKNOWN = "unknown"
    TARGET = "target"
    NON_TARGET = "non_target"

class BinaryClassificationPromptTemplate(BaseLLMPromptTemplate, EnumValueMixin):
    criteria: str
    target_category: str
    non_target_category: str
    uncertainty: Uncertainty = Uncertainty.UNKNOWN
    include_category: bool = True
    include_reasoning: bool = False
    include_score: bool = False
    score_range: Tuple[float, float] = (0.0, 1.0)
    pre_messages: List[LLMMessage] = []
    def get_blocks(self) -> Sequence[PromptBlock]

class MulticlassClassificationPromptTemplate(BaseLLMPromptTemplate, EnumValueMixin):
    criteria: str
    category_criteria: Dict[str, str]
    uncertainty: Union[Literal["UNKNOWN"], str] = "UNKNOWN"
    include_category: bool = True
    include_reasoning: bool = False
    include_score: bool = False
    score_range: Tuple[float, float] = (0.0, 1.0)
    pre_messages: List[LLMMessage] = []
    def get_blocks(self) -> Sequence[PromptBlock]
    def get_score_column(self, category: str) -> str

Import

from evidently.llm.templates import (
    BaseLLMPromptTemplate,
    BinaryClassificationPromptTemplate,
    MulticlassClassificationPromptTemplate,
    Uncertainty,
)

I/O Contract

Inputs

Name	Type	Required	Description
data	pd.DataFrame	Yes	DataFrame containing input text data to evaluate
input_columns	Dict[str, str]	Yes	Mapping from template placeholder names to DataFrame column names
criteria	str	No	Classification criteria or instructions text
target_category	str	Yes (Binary)	Name of the target/positive category
non_target_category	str	Yes (Binary)	Name of the non-target/negative category
category_criteria	Dict[str, str]	Yes (Multiclass)	Mapping of category names to criteria descriptions
uncertainty	Uncertainty or str	No	Strategy for handling uncertain classifications
include_category	bool	No	Whether to output category (default: True)
include_reasoning	bool	No	Whether to output reasoning (default: False)
include_score	bool	No	Whether to output scores (default: False)

Outputs

Name	Type	Description
LLMRequest	Iterator[LLMRequest[dict]]	Iterator of LLM requests, one per DataFrame row, containing messages and response parser
output columns	List[str]	List of column names produced (subset of: category, reasoning, score, score_{category})
column types	ColumnType	Type for each output column (Categorical, Text, or Numerical)

Usage Examples

Binary Classification Template

from evidently.llm.templates import BinaryClassificationPromptTemplate, Uncertainty

template = BinaryClassificationPromptTemplate(
    criteria="Evaluate whether the response is helpful and answers the user's question.",
    target_category="Helpful",
    non_target_category="Not Helpful",
    uncertainty=Uncertainty.UNKNOWN,
    include_category=True,
    include_reasoning=True,
    include_score=True,
    score_range=(1, 5),
)

# Output columns: ["category", "score", "reasoning"]
print(template.list_output_columns())

Multiclass Classification Template

from evidently.llm.templates import MulticlassClassificationPromptTemplate

template = MulticlassClassificationPromptTemplate(
    criteria="Classify the sentiment of the customer review.",
    category_criteria={
        "Positive": "The review expresses satisfaction or praise.",
        "Negative": "The review expresses dissatisfaction or criticism.",
        "Neutral": "The review is factual without strong sentiment.",
    },
    include_category=True,
    include_score=True,
    include_reasoning=True,
)

# Output columns: ["category", "score_Positive", "score_Negative", "score_Neutral", "reasoning"]
print(template.list_output_columns())

Using with LLMEval Descriptor

from evidently.descriptors.generated_descriptors import LLMEval
from evidently.llm.templates import BinaryClassificationPromptTemplate

template = BinaryClassificationPromptTemplate(
    criteria="Is this text professional?",
    target_category="Professional",
    non_target_category="Unprofessional",
    include_category=True,
)

descriptor = LLMEval(
    column_name="email_text",
    provider="openai",
    model="gpt-4o-mini",
    template=template,
    alias="Professionalism Check",
)

Related Pages

Environment:Evidentlyai_Evidently_Python_Core_Environment
Implementation:Evidentlyai_Evidently_Generated_Descriptors -- Factory functions (LLMJudge, LLMEval, BiasLLMEval, etc.) that use these templates
Implementation:Evidentlyai_Evidently_Pydantic_Utils -- Provides EnumValueMixin used by the template classes for enum serialization
Implementation:Evidentlyai_Evidently_Metric_Types -- The metric type system that LLM evaluation results feed into

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment