Implementation:Evidentlyai Evidently LLM Templates
| Knowledge Sources | |
|---|---|
| Domains | LLM Evaluation, NLP, Classification |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Defines LLM prompt template classes for binary and multiclass classification evaluation tasks, including the base template interface, uncertainty handling strategies, and structured output generation.
Description
The templates module provides the prompt template infrastructure for Evidently's LLM-based evaluation system. It defines how prompts are constructed, how inputs are mapped from DataFrame columns, and how structured outputs (category, score, reasoning) are extracted from LLM responses.
Core Classes:
- BaseLLMPromptTemplate -- Abstract base class extending BlockPromptTemplate. Defines the interface that all LLM evaluation templates must implement:
iterate_messages(data, input_columns): Generates LLMRequest objects for each row of a DataFrame. Maps DataFrame columns to template placeholders viainput_columnsdict.list_output_columns(): Returns column names the template produces.get_type(subcolumn): Returns the ColumnType (Categorical, Numerical, Text) for each output column.get_main_output_column(): Returns the primary output column name.- Marked as
is_base_type = Truefor polymorphic deserialization.
- Uncertainty -- Enum defining strategies for handling uncertain classifications:
UNKNOWN: Use a separate "UNKNOWN" category.TARGET: Treat uncertain cases as the target category.NON_TARGET: Treat uncertain cases as the non-target category.
- BinaryClassificationPromptTemplate -- Template for binary classification tasks. Configuration fields:
criteria: Classification criteria or instructions text.target_category/non_target_category: The two category names.uncertainty: How to handle uncertain classifications (default: UNKNOWN).include_category(default: True): Whether to output the category classification.include_reasoning(default: False): Whether to output reasoning text.include_score(default: False): Whether to output confidence scores.score_range(default: (0.0, 1.0)): Min/max for score values.output_column,output_reasoning_column,output_score_column: Customizable output column names.instructions_template: Configurable instruction format with{__categories__}and{__scoring__}placeholders.anchor_start/anchor_end: Text markers delineating input text boundaries.placeholders: Additional placeholder values for template substitution.pre_messages: Additional LLMMessage objects prepended to the prompt.- Builds prompts using PromptBlock composables: simple text, anchored input, and JSON output specification.
- MulticlassClassificationPromptTemplate -- Template for multiclass classification tasks. Configuration fields:
category_criteria: Dict mapping category names to their criteria descriptions.uncertainty: String specifying the uncertainty category (default: "UNKNOWN").include_score: When enabled, generates per-category score columns (e.g., "score_cat1", "score_cat2").output_score_column_prefix: Prefix for per-category score columns (default: "score").- All other fields mirror BinaryClassificationPromptTemplate (criteria, include_category, include_reasoning, anchor_start/end, pre_messages, etc.).
get_score_column(category): Generates per-category score column names.
Output Column Types:
| Column | Type |
|---|---|
| category (output_column) | ColumnType.Categorical |
| reasoning (output_reasoning_column) | ColumnType.Text |
| score (output_score_column) | ColumnType.Numerical |
| score_{category} (multiclass) | ColumnType.Numerical |
Usage
Use this module when:
- Configuring LLM-based evaluation descriptors (LLMEval, LLMJudge, BiasLLMEval, etc.).
- Defining custom binary or multiclass classification tasks for LLM judges.
- Building structured LLM evaluation pipelines with category, score, and reasoning outputs.
Code Reference
Source Location
- Repository: Evidentlyai_Evidently
- File:
src/evidently/llm/templates.py
Signature
class BaseLLMPromptTemplate(BlockPromptTemplate):
def iterate_messages(self, data: pd.DataFrame, input_columns: Dict[str, str]) -> Iterator[LLMRequest[dict]]
def list_output_columns(self) -> List[str]
def get_type(self, subcolumn: Optional[str]) -> ColumnType
def get_main_output_column(self) -> str
class Uncertainty(str, Enum):
UNKNOWN = "unknown"
TARGET = "target"
NON_TARGET = "non_target"
class BinaryClassificationPromptTemplate(BaseLLMPromptTemplate, EnumValueMixin):
criteria: str
target_category: str
non_target_category: str
uncertainty: Uncertainty = Uncertainty.UNKNOWN
include_category: bool = True
include_reasoning: bool = False
include_score: bool = False
score_range: Tuple[float, float] = (0.0, 1.0)
pre_messages: List[LLMMessage] = []
def get_blocks(self) -> Sequence[PromptBlock]
class MulticlassClassificationPromptTemplate(BaseLLMPromptTemplate, EnumValueMixin):
criteria: str
category_criteria: Dict[str, str]
uncertainty: Union[Literal["UNKNOWN"], str] = "UNKNOWN"
include_category: bool = True
include_reasoning: bool = False
include_score: bool = False
score_range: Tuple[float, float] = (0.0, 1.0)
pre_messages: List[LLMMessage] = []
def get_blocks(self) -> Sequence[PromptBlock]
def get_score_column(self, category: str) -> str
Import
from evidently.llm.templates import (
BaseLLMPromptTemplate,
BinaryClassificationPromptTemplate,
MulticlassClassificationPromptTemplate,
Uncertainty,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | pd.DataFrame | Yes | DataFrame containing input text data to evaluate |
| input_columns | Dict[str, str] | Yes | Mapping from template placeholder names to DataFrame column names |
| criteria | str | No | Classification criteria or instructions text |
| target_category | str | Yes (Binary) | Name of the target/positive category |
| non_target_category | str | Yes (Binary) | Name of the non-target/negative category |
| category_criteria | Dict[str, str] | Yes (Multiclass) | Mapping of category names to criteria descriptions |
| uncertainty | Uncertainty or str | No | Strategy for handling uncertain classifications |
| include_category | bool | No | Whether to output category (default: True) |
| include_reasoning | bool | No | Whether to output reasoning (default: False) |
| include_score | bool | No | Whether to output scores (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| LLMRequest | Iterator[LLMRequest[dict]] | Iterator of LLM requests, one per DataFrame row, containing messages and response parser |
| output columns | List[str] | List of column names produced (subset of: category, reasoning, score, score_{category}) |
| column types | ColumnType | Type for each output column (Categorical, Text, or Numerical) |
Usage Examples
Binary Classification Template
from evidently.llm.templates import BinaryClassificationPromptTemplate, Uncertainty
template = BinaryClassificationPromptTemplate(
criteria="Evaluate whether the response is helpful and answers the user's question.",
target_category="Helpful",
non_target_category="Not Helpful",
uncertainty=Uncertainty.UNKNOWN,
include_category=True,
include_reasoning=True,
include_score=True,
score_range=(1, 5),
)
# Output columns: ["category", "score", "reasoning"]
print(template.list_output_columns())
Multiclass Classification Template
from evidently.llm.templates import MulticlassClassificationPromptTemplate
template = MulticlassClassificationPromptTemplate(
criteria="Classify the sentiment of the customer review.",
category_criteria={
"Positive": "The review expresses satisfaction or praise.",
"Negative": "The review expresses dissatisfaction or criticism.",
"Neutral": "The review is factual without strong sentiment.",
},
include_category=True,
include_score=True,
include_reasoning=True,
)
# Output columns: ["category", "score_Positive", "score_Negative", "score_Neutral", "reasoning"]
print(template.list_output_columns())
Using with LLMEval Descriptor
from evidently.descriptors.generated_descriptors import LLMEval
from evidently.llm.templates import BinaryClassificationPromptTemplate
template = BinaryClassificationPromptTemplate(
criteria="Is this text professional?",
target_category="Professional",
non_target_category="Unprofessional",
include_category=True,
)
descriptor = LLMEval(
column_name="email_text",
provider="openai",
model="gpt-4o-mini",
template=template,
alias="Professionalism Check",
)
Related Pages
- Environment:Evidentlyai_Evidently_Python_Core_Environment
- Implementation:Evidentlyai_Evidently_Generated_Descriptors -- Factory functions (LLMJudge, LLMEval, BiasLLMEval, etc.) that use these templates
- Implementation:Evidentlyai_Evidently_Pydantic_Utils -- Provides EnumValueMixin used by the template classes for enum serialization
- Implementation:Evidentlyai_Evidently_Metric_Types -- The metric type system that LLM evaluation results feed into