Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Arize ai Phoenix Legacy Default Templates

From Leeroopedia

LLM_Evaluation Prompt_Engineering

Overview

The Legacy Default Templates module defines a comprehensive library of predefined prompt templates and corresponding rails maps for LLM-based evaluation tasks. Each evaluation task is represented as a ClassificationTemplate constant that bundles a base prompt, an explanation prompt (for chain-of-thought reasoning), a set of classification rails, and associated scores. The module also provides an EvalCriteria enum that aggregates all templates for convenient programmatic access.

These templates serve as the backbone of the Phoenix Evals classification system, providing ready-to-use evaluation criteria for common LLM quality assessment scenarios including hallucination detection, retrieval relevance, question-answering correctness, toxicity screening, summarization quality, code readability, SQL correctness, human-vs-AI comparison, user frustration detection, and tool calling evaluation.

Each template pair consists of a base template (for direct label-only classification) and a with-explanation template (for chain-of-thought evaluation where the LLM provides reasoning before a final label).

Code Reference

Attribute Details
Source File packages/phoenix-evals/src/phoenix/evals/legacy/default_templates.py
Repository Arize-ai/phoenix
Lines 1098
Module phoenix.evals.legacy.default_templates
Key Symbols RAG_RELEVANCY_PROMPT_TEMPLATE, HALLUCINATION_PROMPT_TEMPLATE, QA_PROMPT_TEMPLATE, TOXICITY_PROMPT_TEMPLATE, SUMMARIZATION_PROMPT_TEMPLATE, SQL_GEN_EVAL_PROMPT_TEMPLATE, CODE_READABILITY_PROMPT_TEMPLATE, HUMAN_VS_AI_PROMPT_TEMPLATE, REFERENCE_LINK_CORRECTNESS_PROMPT_TEMPLATE, TOOL_CALLING_PROMPT_TEMPLATE, TOOL_SELECTION_PROMPT_TEMPLATE, TOOL_PARAMETER_EXTRACTION_PROMPT_TEMPLATE, USER_FRUSTRATION_PROMPT_TEMPLATE, CODE_FUNCTIONALITY_PROMPT_TEMPLATE, EvalCriteria
Dependencies phoenix.evals.legacy.templates.ClassificationTemplate, phoenix.evals.legacy.span_templates

I/O Contract

Template Constants

Each template constant is a ClassificationTemplate instance with the following structure:

Template Rails Template Variables Purpose
RAG_RELEVANCY_PROMPT_TEMPLATE ["relevant", "unrelated"] {input}, {reference} Evaluates whether a reference text is relevant to answering a question.
HALLUCINATION_PROMPT_TEMPLATE ["hallucinated", "factual"] {input}, {reference}, {output} Detects factual inaccuracies in an answer relative to reference text.
QA_PROMPT_TEMPLATE ["correct", "incorrect"] {input}, {reference}, {output} Checks if an answer correctly addresses a question given reference material.
TOXICITY_PROMPT_TEMPLATE ["toxic", "non-toxic"] {input} Screens text for hateful, demeaning, or threatening content.
SUMMARIZATION_PROMPT_TEMPLATE ["good", "bad"] {output}, {input} Assesses summary quality (comprehensive, concise, coherent, independent).
CODE_READABILITY_PROMPT_TEMPLATE ["readable", "unreadable"] {input}, {output} Evaluates code readability against a task assignment.
CODE_FUNCTIONALITY_PROMPT_TEMPLATE ["bug_free", "is_bug"] {coding_instruction}, {code} Determines if code correctly solves an instruction without bugs.
REFERENCE_LINK_CORRECTNESS_PROMPT_TEMPLATE ["correct", "incorrect"] {input}, {reference} Checks if documentation correctly answers customer questions.
HUMAN_VS_AI_PROMPT_TEMPLATE ["correct", "incorrect"] {question}, {correct_answer}, {ai_generated_answer} Compares AI-generated answers against human expert ground truth.
SQL_GEN_EVAL_PROMPT_TEMPLATE ["correct", "incorrect"] {question}, {query_gen}, {response} Evaluates SQL query correctness for a given instruction.
USER_FRUSTRATION_PROMPT_TEMPLATE ["frustrated", "ok"] {conversation} Detects user frustration in conversation transcripts.
TOOL_CALLING_PROMPT_TEMPLATE ["correct", "incorrect"] {question}, {tool_call}, {tool_definitions} Evaluates tool selection and parameter extraction together.
TOOL_SELECTION_PROMPT_TEMPLATE ["correct", "incorrect"] {question}, {tool_call}, {tool_definitions} Evaluates only whether the correct tool was selected.
TOOL_PARAMETER_EXTRACTION_PROMPT_TEMPLATE ["correct", "incorrect"] {question}, {tool_call}, {tool_definitions} Evaluates only whether parameters were correctly extracted.

All templates use scores [1, 0] where the first rail receives a score of 1 and the second receives 0.

EvalCriteria Enum

Member Template Reference
RELEVANCE RAG_RELEVANCY_PROMPT_TEMPLATE
HALLUCINATION HALLUCINATION_PROMPT_TEMPLATE
TOXICITY TOXICITY_PROMPT_TEMPLATE
QA QA_PROMPT_TEMPLATE
SUMMARIZATION SUMMARIZATION_PROMPT_TEMPLATE
CODE_READABILITY CODE_READABILITY_PROMPT_TEMPLATE
REFERENCE_LINK_CORRECTNESS REFERENCE_LINK_CORRECTNESS_PROMPT_TEMPLATE
HUMAN_VS_AI HUMAN_VS_AI_PROMPT_TEMPLATE
SQL_GEN_EVAL SQL_GEN_EVAL_PROMPT_TEMPLATE
CODE_FUNCTIONALITY CODE_FUNCTIONALITY_PROMPT_TEMPLATE
USER_FRUSTRATION USER_FRUSTRATION_PROMPT_TEMPLATE
HALLUCINATION_SPAN_LEVEL HALLUCINATION_SPAN_PROMPT_TEMPLATE
QA_SPAN_LEVEL QA_SPAN_PROMPT_TEMPLATE
TOOL_CALLING TOOL_CALLING_PROMPT_TEMPLATE
TOOL_CALLING_SPAN_LEVEL TOOL_CALLING_SPAN_PROMPT_TEMPLATE
TOOL_SELECTION TOOL_SELECTION_PROMPT_TEMPLATE
TOOL_PARAMETER_EXTRACTION TOOL_PARAMETER_EXTRACTION_PROMPT_TEMPLATE

Usage Examples

from phoenix.evals.legacy.default_templates import (
    HALLUCINATION_PROMPT_TEMPLATE,
    RAG_RELEVANCY_PROMPT_TEMPLATE,
    EvalCriteria,
)

# Use a template directly with llm_classify
from phoenix.evals.legacy.classify import llm_classify

result = llm_classify(
    data=df,
    model=model,
    template=HALLUCINATION_PROMPT_TEMPLATE,
    rails=["hallucinated", "factual"],
)
# Access templates via the EvalCriteria enum
template = EvalCriteria.RELEVANCE.value
print(template.rails)  # ['relevant', 'unrelated']
print(template.variables)  # ['input', 'reference']

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment