Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Evidentlyai Evidently LLM Templates

From Leeroopedia
Knowledge Sources
Domains LLM Evaluation, NLP, Classification
Last Updated 2026-02-14 12:00 GMT

Overview

Defines LLM prompt template classes for binary and multiclass classification evaluation tasks, including the base template interface, uncertainty handling strategies, and structured output generation.

Description

The templates module provides the prompt template infrastructure for Evidently's LLM-based evaluation system. It defines how prompts are constructed, how inputs are mapped from DataFrame columns, and how structured outputs (category, score, reasoning) are extracted from LLM responses.

Core Classes:

  • BaseLLMPromptTemplate -- Abstract base class extending BlockPromptTemplate. Defines the interface that all LLM evaluation templates must implement:
    • iterate_messages(data, input_columns): Generates LLMRequest objects for each row of a DataFrame. Maps DataFrame columns to template placeholders via input_columns dict.
    • list_output_columns(): Returns column names the template produces.
    • get_type(subcolumn): Returns the ColumnType (Categorical, Numerical, Text) for each output column.
    • get_main_output_column(): Returns the primary output column name.
    • Marked as is_base_type = True for polymorphic deserialization.
  • Uncertainty -- Enum defining strategies for handling uncertain classifications:
    • UNKNOWN: Use a separate "UNKNOWN" category.
    • TARGET: Treat uncertain cases as the target category.
    • NON_TARGET: Treat uncertain cases as the non-target category.
  • BinaryClassificationPromptTemplate -- Template for binary classification tasks. Configuration fields:
    • criteria: Classification criteria or instructions text.
    • target_category / non_target_category: The two category names.
    • uncertainty: How to handle uncertain classifications (default: UNKNOWN).
    • include_category (default: True): Whether to output the category classification.
    • include_reasoning (default: False): Whether to output reasoning text.
    • include_score (default: False): Whether to output confidence scores.
    • score_range (default: (0.0, 1.0)): Min/max for score values.
    • output_column, output_reasoning_column, output_score_column: Customizable output column names.
    • instructions_template: Configurable instruction format with {__categories__} and {__scoring__} placeholders.
    • anchor_start / anchor_end: Text markers delineating input text boundaries.
    • placeholders: Additional placeholder values for template substitution.
    • pre_messages: Additional LLMMessage objects prepended to the prompt.
    • Builds prompts using PromptBlock composables: simple text, anchored input, and JSON output specification.
  • MulticlassClassificationPromptTemplate -- Template for multiclass classification tasks. Configuration fields:
    • category_criteria: Dict mapping category names to their criteria descriptions.
    • uncertainty: String specifying the uncertainty category (default: "UNKNOWN").
    • include_score: When enabled, generates per-category score columns (e.g., "score_cat1", "score_cat2").
    • output_score_column_prefix: Prefix for per-category score columns (default: "score").
    • All other fields mirror BinaryClassificationPromptTemplate (criteria, include_category, include_reasoning, anchor_start/end, pre_messages, etc.).
    • get_score_column(category): Generates per-category score column names.

Output Column Types:

Column Type
category (output_column) ColumnType.Categorical
reasoning (output_reasoning_column) ColumnType.Text
score (output_score_column) ColumnType.Numerical
score_{category} (multiclass) ColumnType.Numerical

Usage

Use this module when:

  • Configuring LLM-based evaluation descriptors (LLMEval, LLMJudge, BiasLLMEval, etc.).
  • Defining custom binary or multiclass classification tasks for LLM judges.
  • Building structured LLM evaluation pipelines with category, score, and reasoning outputs.

Code Reference

Source Location

Signature

class BaseLLMPromptTemplate(BlockPromptTemplate):
    def iterate_messages(self, data: pd.DataFrame, input_columns: Dict[str, str]) -> Iterator[LLMRequest[dict]]
    def list_output_columns(self) -> List[str]
    def get_type(self, subcolumn: Optional[str]) -> ColumnType
    def get_main_output_column(self) -> str

class Uncertainty(str, Enum):
    UNKNOWN = "unknown"
    TARGET = "target"
    NON_TARGET = "non_target"

class BinaryClassificationPromptTemplate(BaseLLMPromptTemplate, EnumValueMixin):
    criteria: str
    target_category: str
    non_target_category: str
    uncertainty: Uncertainty = Uncertainty.UNKNOWN
    include_category: bool = True
    include_reasoning: bool = False
    include_score: bool = False
    score_range: Tuple[float, float] = (0.0, 1.0)
    pre_messages: List[LLMMessage] = []
    def get_blocks(self) -> Sequence[PromptBlock]

class MulticlassClassificationPromptTemplate(BaseLLMPromptTemplate, EnumValueMixin):
    criteria: str
    category_criteria: Dict[str, str]
    uncertainty: Union[Literal["UNKNOWN"], str] = "UNKNOWN"
    include_category: bool = True
    include_reasoning: bool = False
    include_score: bool = False
    score_range: Tuple[float, float] = (0.0, 1.0)
    pre_messages: List[LLMMessage] = []
    def get_blocks(self) -> Sequence[PromptBlock]
    def get_score_column(self, category: str) -> str

Import

from evidently.llm.templates import (
    BaseLLMPromptTemplate,
    BinaryClassificationPromptTemplate,
    MulticlassClassificationPromptTemplate,
    Uncertainty,
)

I/O Contract

Inputs

Name Type Required Description
data pd.DataFrame Yes DataFrame containing input text data to evaluate
input_columns Dict[str, str] Yes Mapping from template placeholder names to DataFrame column names
criteria str No Classification criteria or instructions text
target_category str Yes (Binary) Name of the target/positive category
non_target_category str Yes (Binary) Name of the non-target/negative category
category_criteria Dict[str, str] Yes (Multiclass) Mapping of category names to criteria descriptions
uncertainty Uncertainty or str No Strategy for handling uncertain classifications
include_category bool No Whether to output category (default: True)
include_reasoning bool No Whether to output reasoning (default: False)
include_score bool No Whether to output scores (default: False)

Outputs

Name Type Description
LLMRequest Iterator[LLMRequest[dict]] Iterator of LLM requests, one per DataFrame row, containing messages and response parser
output columns List[str] List of column names produced (subset of: category, reasoning, score, score_{category})
column types ColumnType Type for each output column (Categorical, Text, or Numerical)

Usage Examples

Binary Classification Template

from evidently.llm.templates import BinaryClassificationPromptTemplate, Uncertainty

template = BinaryClassificationPromptTemplate(
    criteria="Evaluate whether the response is helpful and answers the user's question.",
    target_category="Helpful",
    non_target_category="Not Helpful",
    uncertainty=Uncertainty.UNKNOWN,
    include_category=True,
    include_reasoning=True,
    include_score=True,
    score_range=(1, 5),
)

# Output columns: ["category", "score", "reasoning"]
print(template.list_output_columns())

Multiclass Classification Template

from evidently.llm.templates import MulticlassClassificationPromptTemplate

template = MulticlassClassificationPromptTemplate(
    criteria="Classify the sentiment of the customer review.",
    category_criteria={
        "Positive": "The review expresses satisfaction or praise.",
        "Negative": "The review expresses dissatisfaction or criticism.",
        "Neutral": "The review is factual without strong sentiment.",
    },
    include_category=True,
    include_score=True,
    include_reasoning=True,
)

# Output columns: ["category", "score_Positive", "score_Negative", "score_Neutral", "reasoning"]
print(template.list_output_columns())

Using with LLMEval Descriptor

from evidently.descriptors.generated_descriptors import LLMEval
from evidently.llm.templates import BinaryClassificationPromptTemplate

template = BinaryClassificationPromptTemplate(
    criteria="Is this text professional?",
    target_category="Professional",
    non_target_category="Unprofessional",
    include_category=True,
)

descriptor = LLMEval(
    column_name="email_text",
    provider="openai",
    model="gpt-4o-mini",
    template=template,
    alias="Professionalism Check",
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment