Implementation:Arize ai Phoenix Legacy Default Templates

Overview

The Legacy Default Templates module defines a comprehensive library of predefined prompt templates and corresponding rails maps for LLM-based evaluation tasks. Each evaluation task is represented as a ClassificationTemplate constant that bundles a base prompt, an explanation prompt (for chain-of-thought reasoning), a set of classification rails, and associated scores. The module also provides an EvalCriteria enum that aggregates all templates for convenient programmatic access.

These templates serve as the backbone of the Phoenix Evals classification system, providing ready-to-use evaluation criteria for common LLM quality assessment scenarios including hallucination detection, retrieval relevance, question-answering correctness, toxicity screening, summarization quality, code readability, SQL correctness, human-vs-AI comparison, user frustration detection, and tool calling evaluation.

Each template pair consists of a base template (for direct label-only classification) and a with-explanation template (for chain-of-thought evaluation where the LLM provides reasoning before a final label).

Code Reference

Attribute	Details
Source File	`packages/phoenix-evals/src/phoenix/evals/legacy/default_templates.py`
Repository	Arize-ai/phoenix
Lines	1098
Module	`phoenix.evals.legacy.default_templates`
Key Symbols	`RAG_RELEVANCY_PROMPT_TEMPLATE`, `HALLUCINATION_PROMPT_TEMPLATE`, `QA_PROMPT_TEMPLATE`, `TOXICITY_PROMPT_TEMPLATE`, `SUMMARIZATION_PROMPT_TEMPLATE`, `SQL_GEN_EVAL_PROMPT_TEMPLATE`, `CODE_READABILITY_PROMPT_TEMPLATE`, `HUMAN_VS_AI_PROMPT_TEMPLATE`, `REFERENCE_LINK_CORRECTNESS_PROMPT_TEMPLATE`, `TOOL_CALLING_PROMPT_TEMPLATE`, `TOOL_SELECTION_PROMPT_TEMPLATE`, `TOOL_PARAMETER_EXTRACTION_PROMPT_TEMPLATE`, `USER_FRUSTRATION_PROMPT_TEMPLATE`, `CODE_FUNCTIONALITY_PROMPT_TEMPLATE`, `EvalCriteria`
Dependencies	`phoenix.evals.legacy.templates.ClassificationTemplate`, `phoenix.evals.legacy.span_templates`

I/O Contract

Template Constants

Each template constant is a ClassificationTemplate instance with the following structure:

Template	Rails	Template Variables	Purpose
`RAG_RELEVANCY_PROMPT_TEMPLATE`	`["relevant", "unrelated"]`	`{input}`, `{reference}`	Evaluates whether a reference text is relevant to answering a question.
`HALLUCINATION_PROMPT_TEMPLATE`	`["hallucinated", "factual"]`	`{input}`, `{reference}`, `{output}`	Detects factual inaccuracies in an answer relative to reference text.
`QA_PROMPT_TEMPLATE`	`["correct", "incorrect"]`	`{input}`, `{reference}`, `{output}`	Checks if an answer correctly addresses a question given reference material.
`TOXICITY_PROMPT_TEMPLATE`	`["toxic", "non-toxic"]`	`{input}`	Screens text for hateful, demeaning, or threatening content.
`SUMMARIZATION_PROMPT_TEMPLATE`	`["good", "bad"]`	`{output}`, `{input}`	Assesses summary quality (comprehensive, concise, coherent, independent).
`CODE_READABILITY_PROMPT_TEMPLATE`	`["readable", "unreadable"]`	`{input}`, `{output}`	Evaluates code readability against a task assignment.
`CODE_FUNCTIONALITY_PROMPT_TEMPLATE`	`["bug_free", "is_bug"]`	`{coding_instruction}`, `{code}`	Determines if code correctly solves an instruction without bugs.
`REFERENCE_LINK_CORRECTNESS_PROMPT_TEMPLATE`	`["correct", "incorrect"]`	`{input}`, `{reference}`	Checks if documentation correctly answers customer questions.
`HUMAN_VS_AI_PROMPT_TEMPLATE`	`["correct", "incorrect"]`	`{question}`, `{correct_answer}`, `{ai_generated_answer}`	Compares AI-generated answers against human expert ground truth.
`SQL_GEN_EVAL_PROMPT_TEMPLATE`	`["correct", "incorrect"]`	`{question}`, `{query_gen}`, `{response}`	Evaluates SQL query correctness for a given instruction.
`USER_FRUSTRATION_PROMPT_TEMPLATE`	`["frustrated", "ok"]`	`{conversation}`	Detects user frustration in conversation transcripts.
`TOOL_CALLING_PROMPT_TEMPLATE`	`["correct", "incorrect"]`	`{question}`, `{tool_call}`, `{tool_definitions}`	Evaluates tool selection and parameter extraction together.
`TOOL_SELECTION_PROMPT_TEMPLATE`	`["correct", "incorrect"]`	`{question}`, `{tool_call}`, `{tool_definitions}`	Evaluates only whether the correct tool was selected.
`TOOL_PARAMETER_EXTRACTION_PROMPT_TEMPLATE`	`["correct", "incorrect"]`	`{question}`, `{tool_call}`, `{tool_definitions}`	Evaluates only whether parameters were correctly extracted.

All templates use scores [1, 0] where the first rail receives a score of 1 and the second receives 0.

EvalCriteria Enum

Member	Template Reference
`RELEVANCE`	`RAG_RELEVANCY_PROMPT_TEMPLATE`
`HALLUCINATION`	`HALLUCINATION_PROMPT_TEMPLATE`
`TOXICITY`	`TOXICITY_PROMPT_TEMPLATE`
`QA`	`QA_PROMPT_TEMPLATE`
`SUMMARIZATION`	`SUMMARIZATION_PROMPT_TEMPLATE`
`CODE_READABILITY`	`CODE_READABILITY_PROMPT_TEMPLATE`
`REFERENCE_LINK_CORRECTNESS`	`REFERENCE_LINK_CORRECTNESS_PROMPT_TEMPLATE`
`HUMAN_VS_AI`	`HUMAN_VS_AI_PROMPT_TEMPLATE`
`SQL_GEN_EVAL`	`SQL_GEN_EVAL_PROMPT_TEMPLATE`
`CODE_FUNCTIONALITY`	`CODE_FUNCTIONALITY_PROMPT_TEMPLATE`
`USER_FRUSTRATION`	`USER_FRUSTRATION_PROMPT_TEMPLATE`
`HALLUCINATION_SPAN_LEVEL`	`HALLUCINATION_SPAN_PROMPT_TEMPLATE`
`QA_SPAN_LEVEL`	`QA_SPAN_PROMPT_TEMPLATE`
`TOOL_CALLING`	`TOOL_CALLING_PROMPT_TEMPLATE`
`TOOL_CALLING_SPAN_LEVEL`	`TOOL_CALLING_SPAN_PROMPT_TEMPLATE`
`TOOL_SELECTION`	`TOOL_SELECTION_PROMPT_TEMPLATE`
`TOOL_PARAMETER_EXTRACTION`	`TOOL_PARAMETER_EXTRACTION_PROMPT_TEMPLATE`

Usage Examples

from phoenix.evals.legacy.default_templates import (
    HALLUCINATION_PROMPT_TEMPLATE,
    RAG_RELEVANCY_PROMPT_TEMPLATE,
    EvalCriteria,
)

# Use a template directly with llm_classify
from phoenix.evals.legacy.classify import llm_classify

result = llm_classify(
    data=df,
    model=model,
    template=HALLUCINATION_PROMPT_TEMPLATE,
    rails=["hallucinated", "factual"],
)

# Access templates via the EvalCriteria enum
template = EvalCriteria.RELEVANCE.value
print(template.rails)  # ['relevant', 'unrelated']
print(template.variables)  # ['input', 'reference']

Related Pages

Arize_ai_Phoenix_Legacy_Classify - Primary consumer of these templates via llm_classify()
Arize_ai_Phoenix_Legacy_Evaluators - Evaluator classes that wrap these templates
Arize_ai_Phoenix_Legacy_Templates - ClassificationTemplate base class
Arize_ai_Phoenix_Legacy_Span_Templates - Span-level evaluation templates imported by this module

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment