Implementation:Arize ai Phoenix Legacy Default Templates
LLM_Evaluation Prompt_Engineering
Overview
The Legacy Default Templates module defines a comprehensive library of predefined prompt templates and corresponding rails maps for LLM-based evaluation tasks. Each evaluation task is represented as a ClassificationTemplate constant that bundles a base prompt, an explanation prompt (for chain-of-thought reasoning), a set of classification rails, and associated scores. The module also provides an EvalCriteria enum that aggregates all templates for convenient programmatic access.
These templates serve as the backbone of the Phoenix Evals classification system, providing ready-to-use evaluation criteria for common LLM quality assessment scenarios including hallucination detection, retrieval relevance, question-answering correctness, toxicity screening, summarization quality, code readability, SQL correctness, human-vs-AI comparison, user frustration detection, and tool calling evaluation.
Each template pair consists of a base template (for direct label-only classification) and a with-explanation template (for chain-of-thought evaluation where the LLM provides reasoning before a final label).
Code Reference
| Attribute | Details |
|---|---|
| Source File | packages/phoenix-evals/src/phoenix/evals/legacy/default_templates.py
|
| Repository | Arize-ai/phoenix |
| Lines | 1098 |
| Module | phoenix.evals.legacy.default_templates
|
| Key Symbols | RAG_RELEVANCY_PROMPT_TEMPLATE, HALLUCINATION_PROMPT_TEMPLATE, QA_PROMPT_TEMPLATE, TOXICITY_PROMPT_TEMPLATE, SUMMARIZATION_PROMPT_TEMPLATE, SQL_GEN_EVAL_PROMPT_TEMPLATE, CODE_READABILITY_PROMPT_TEMPLATE, HUMAN_VS_AI_PROMPT_TEMPLATE, REFERENCE_LINK_CORRECTNESS_PROMPT_TEMPLATE, TOOL_CALLING_PROMPT_TEMPLATE, TOOL_SELECTION_PROMPT_TEMPLATE, TOOL_PARAMETER_EXTRACTION_PROMPT_TEMPLATE, USER_FRUSTRATION_PROMPT_TEMPLATE, CODE_FUNCTIONALITY_PROMPT_TEMPLATE, EvalCriteria
|
| Dependencies | phoenix.evals.legacy.templates.ClassificationTemplate, phoenix.evals.legacy.span_templates
|
I/O Contract
Template Constants
Each template constant is a ClassificationTemplate instance with the following structure:
| Template | Rails | Template Variables | Purpose |
|---|---|---|---|
RAG_RELEVANCY_PROMPT_TEMPLATE |
["relevant", "unrelated"] |
{input}, {reference} |
Evaluates whether a reference text is relevant to answering a question. |
HALLUCINATION_PROMPT_TEMPLATE |
["hallucinated", "factual"] |
{input}, {reference}, {output} |
Detects factual inaccuracies in an answer relative to reference text. |
QA_PROMPT_TEMPLATE |
["correct", "incorrect"] |
{input}, {reference}, {output} |
Checks if an answer correctly addresses a question given reference material. |
TOXICITY_PROMPT_TEMPLATE |
["toxic", "non-toxic"] |
{input} |
Screens text for hateful, demeaning, or threatening content. |
SUMMARIZATION_PROMPT_TEMPLATE |
["good", "bad"] |
{output}, {input} |
Assesses summary quality (comprehensive, concise, coherent, independent). |
CODE_READABILITY_PROMPT_TEMPLATE |
["readable", "unreadable"] |
{input}, {output} |
Evaluates code readability against a task assignment. |
CODE_FUNCTIONALITY_PROMPT_TEMPLATE |
["bug_free", "is_bug"] |
{coding_instruction}, {code} |
Determines if code correctly solves an instruction without bugs. |
REFERENCE_LINK_CORRECTNESS_PROMPT_TEMPLATE |
["correct", "incorrect"] |
{input}, {reference} |
Checks if documentation correctly answers customer questions. |
HUMAN_VS_AI_PROMPT_TEMPLATE |
["correct", "incorrect"] |
{question}, {correct_answer}, {ai_generated_answer} |
Compares AI-generated answers against human expert ground truth. |
SQL_GEN_EVAL_PROMPT_TEMPLATE |
["correct", "incorrect"] |
{question}, {query_gen}, {response} |
Evaluates SQL query correctness for a given instruction. |
USER_FRUSTRATION_PROMPT_TEMPLATE |
["frustrated", "ok"] |
{conversation} |
Detects user frustration in conversation transcripts. |
TOOL_CALLING_PROMPT_TEMPLATE |
["correct", "incorrect"] |
{question}, {tool_call}, {tool_definitions} |
Evaluates tool selection and parameter extraction together. |
TOOL_SELECTION_PROMPT_TEMPLATE |
["correct", "incorrect"] |
{question}, {tool_call}, {tool_definitions} |
Evaluates only whether the correct tool was selected. |
TOOL_PARAMETER_EXTRACTION_PROMPT_TEMPLATE |
["correct", "incorrect"] |
{question}, {tool_call}, {tool_definitions} |
Evaluates only whether parameters were correctly extracted. |
All templates use scores [1, 0] where the first rail receives a score of 1 and the second receives 0.
EvalCriteria Enum
| Member | Template Reference |
|---|---|
RELEVANCE |
RAG_RELEVANCY_PROMPT_TEMPLATE
|
HALLUCINATION |
HALLUCINATION_PROMPT_TEMPLATE
|
TOXICITY |
TOXICITY_PROMPT_TEMPLATE
|
QA |
QA_PROMPT_TEMPLATE
|
SUMMARIZATION |
SUMMARIZATION_PROMPT_TEMPLATE
|
CODE_READABILITY |
CODE_READABILITY_PROMPT_TEMPLATE
|
REFERENCE_LINK_CORRECTNESS |
REFERENCE_LINK_CORRECTNESS_PROMPT_TEMPLATE
|
HUMAN_VS_AI |
HUMAN_VS_AI_PROMPT_TEMPLATE
|
SQL_GEN_EVAL |
SQL_GEN_EVAL_PROMPT_TEMPLATE
|
CODE_FUNCTIONALITY |
CODE_FUNCTIONALITY_PROMPT_TEMPLATE
|
USER_FRUSTRATION |
USER_FRUSTRATION_PROMPT_TEMPLATE
|
HALLUCINATION_SPAN_LEVEL |
HALLUCINATION_SPAN_PROMPT_TEMPLATE
|
QA_SPAN_LEVEL |
QA_SPAN_PROMPT_TEMPLATE
|
TOOL_CALLING |
TOOL_CALLING_PROMPT_TEMPLATE
|
TOOL_CALLING_SPAN_LEVEL |
TOOL_CALLING_SPAN_PROMPT_TEMPLATE
|
TOOL_SELECTION |
TOOL_SELECTION_PROMPT_TEMPLATE
|
TOOL_PARAMETER_EXTRACTION |
TOOL_PARAMETER_EXTRACTION_PROMPT_TEMPLATE
|
Usage Examples
from phoenix.evals.legacy.default_templates import (
HALLUCINATION_PROMPT_TEMPLATE,
RAG_RELEVANCY_PROMPT_TEMPLATE,
EvalCriteria,
)
# Use a template directly with llm_classify
from phoenix.evals.legacy.classify import llm_classify
result = llm_classify(
data=df,
model=model,
template=HALLUCINATION_PROMPT_TEMPLATE,
rails=["hallucinated", "factual"],
)
# Access templates via the EvalCriteria enum
template = EvalCriteria.RELEVANCE.value
print(template.rails) # ['relevant', 'unrelated']
print(template.variables) # ['input', 'reference']
Related Pages
- Arize_ai_Phoenix_Legacy_Classify - Primary consumer of these templates via
llm_classify() - Arize_ai_Phoenix_Legacy_Evaluators - Evaluator classes that wrap these templates
- Arize_ai_Phoenix_Legacy_Templates - ClassificationTemplate base class
- Arize_ai_Phoenix_Legacy_Span_Templates - Span-level evaluation templates imported by this module