Implementation:Arize ai Phoenix Legacy Public API
Overview
The Legacy Public API is the __init__.py module of the phoenix.evals.legacy subpackage. It serves as the entry point for the v1 evaluation system in the arize-phoenix-evals package, re-exporting all legacy evaluation functions, model classes, evaluator wrappers, template constants, and utility functions that formed the original phoenix-evals interface.
Description
This module consolidates the legacy (v1) API of phoenix-evals into a single import namespace. The v1 system was based on direct LLM classification of DataFrames using template strings and rail maps, in contrast to the modern (v2) API which uses the Evaluator / ClassificationEvaluator class hierarchy.
The legacy module re-exports from the following internal submodules:
phoenix.evals.legacy.classify-- Core classification functions (llm_classify,run_evals).phoenix.evals.legacy.generate-- LLM generation function (llm_generate).phoenix.evals.legacy.evaluators-- Pre-built evaluator classes wrapping common evaluation patterns.phoenix.evals.legacy.models-- Model wrapper classes for various LLM providers.phoenix.evals.legacy.default_templates-- Prompt template constants for built-in evaluation types.phoenix.evals.legacy.span_templates-- Span-level prompt templates.phoenix.evals.legacy.templates-- Template utility classes.phoenix.evals.legacy.retrievals-- Retrieval metric computation (compute_precisions_at_k).phoenix.evals.legacy.utils-- Utility constants and dataset download functions.
The module also exposes the package version via __version__.
Usage
from phoenix.evals.legacy import *
Or import specific symbols:
from phoenix.evals.legacy import llm_classify, OpenAIModel, HALLUCINATION_PROMPT_TEMPLATE
Code Reference
| Property | Value |
|---|---|
| Source File | packages/phoenix-evals/src/phoenix/evals/legacy/__init__.py |
| Module | phoenix.evals.legacy
|
| Lines | ~155 |
| Package | arize-phoenix-evals
|
| Domain | API Surface, Legacy |
Exported Symbols
Core Functions
| Symbol | Source Module | Description |
|---|---|---|
llm_classify |
legacy.classify |
Classifies rows of a DataFrame using an LLM with a prompt template and rail map. Central function of the v1 evaluation system. |
run_evals |
legacy.classify |
Runs multiple evaluation templates against a DataFrame in batch. |
llm_generate |
legacy.generate |
Generates freeform LLM responses for rows in a DataFrame. |
compute_precisions_at_k |
legacy.retrievals |
Computes precision@K metrics for retrieval evaluation. |
download_benchmark_dataset |
legacy.utils |
Downloads standard benchmark datasets for evaluation. |
Model Classes
All model classes are imported from legacy.models:
| Symbol | Description |
|---|---|
OpenAIModel |
Wrapper for OpenAI API models. |
AnthropicModel |
Wrapper for Anthropic API models. |
GeminiModel |
Wrapper for Google Gemini models. |
GoogleGenAIModel |
Wrapper for Google GenAI models. |
VertexAIModel |
Wrapper for Google Vertex AI models. |
BedrockModel |
Wrapper for AWS Bedrock models. |
LiteLLMModel |
Wrapper for the LiteLLM unified interface. |
MistralAIModel |
Wrapper for Mistral AI models. |
Evaluator Classes
Pre-built evaluator wrappers from legacy.evaluators:
| Symbol | Description |
|---|---|
LLMEvaluator |
Base legacy LLM evaluator class. |
HallucinationEvaluator |
Legacy hallucination detection evaluator. |
QAEvaluator |
Legacy question-answering quality evaluator. |
RelevanceEvaluator |
Legacy relevance evaluator. |
SQLEvaluator |
Legacy SQL generation evaluation evaluator. |
SummarizationEvaluator |
Legacy summarization quality evaluator. |
ToxicityEvaluator |
Legacy toxicity detection evaluator. |
Template Classes
From legacy.templates:
| Symbol | Description |
|---|---|
PromptTemplate |
Template class for constructing evaluation prompts with variable substitution. |
ClassificationTemplate |
Specialized template class for classification evaluation prompts. |
Prompt Template Constants
All constants are imported from legacy.default_templates. Each evaluation type provides up to four variants:
| Variant Suffix | Description |
|---|---|
_BASE_TEMPLATE |
The core prompt text without rails or explanation directives. |
_RAILS_MAP |
A dictionary mapping output labels to descriptions for classification. |
_TEMPLATE |
The standard prompt template ready for use with llm_classify.
|
_TEMPLATE_WITH_EXPLANATION |
A variant that requests the LLM to include reasoning. |
Evaluation types and their constant prefixes:
| Evaluation Type | Prefix |
|---|---|
| Hallucination Detection | HALLUCINATION_PROMPT_
|
| Question Answering (QA) | QA_PROMPT_
|
| RAG Relevancy | RAG_RELEVANCY_PROMPT_
|
| Code Functionality | CODE_FUNCTIONALITY_PROMPT_
|
| Code Readability | CODE_READABILITY_PROMPT_
|
| Human vs. AI | HUMAN_VS_AI_PROMPT_
|
| Reference Link Correctness | REFERENCE_LINK_CORRECTNESS_PROMPT_
|
| SQL Generation | SQL_GEN_EVAL_PROMPT_
|
| Tool Calling | TOOL_CALLING_
|
| Toxicity | TOXICITY_PROMPT_
|
| User Frustration | USER_FRUSTRATION_PROMPT_
|
Span-Level Templates
From legacy.span_templates:
| Symbol | Description |
|---|---|
HALLUCINATION_SPAN_PROMPT_TEMPLATE |
Span-level hallucination evaluation prompt template. |
QA_SPAN_PROMPT_TEMPLATE |
Span-level QA evaluation prompt template. |
TOOL_CALLING_SPAN_PROMPT_TEMPLATE |
Span-level tool calling evaluation prompt template. |
Other Constants
| Symbol | Source | Description |
|---|---|---|
NOT_PARSABLE |
legacy.utils |
Sentinel string value returned when LLM output cannot be parsed into a valid classification label. |
I/O Contract
This module is a re-export surface and does not define its own I/O contract. Each re-exported symbol has its own I/O contract in its respective source module.
The key I/O pattern for the legacy system is:
# Input: pandas DataFrame + model + template + rails
results_df = llm_classify(
dataframe=df, # pd.DataFrame with columns referenced in the template
model=model, # One of the Model classes (OpenAIModel, etc.)
template=template, # A prompt template string or PromptTemplate
rails=rails_list, # List of valid classification labels
provide_explanation=True # Optional: include LLM explanations
)
# Output: pandas DataFrame with 'label' and optionally 'explanation' columns
Usage Examples
Legacy Classification with llm_classify
import pandas as pd
from phoenix.evals.legacy import (
OpenAIModel,
llm_classify,
HALLUCINATION_PROMPT_TEMPLATE,
HALLUCINATION_PROMPT_RAILS_MAP,
)
model = OpenAIModel(model="gpt-4o-mini")
df = pd.DataFrame({
"input": ["What is the capital of France?"],
"output": ["Paris is the capital of France."],
"context": ["Paris is the capital and largest city of France."],
})
results = llm_classify(
dataframe=df,
model=model,
template=HALLUCINATION_PROMPT_TEMPLATE,
rails=list(HALLUCINATION_PROMPT_RAILS_MAP.values()),
)
Using Legacy Evaluator Wrappers
from phoenix.evals.legacy import HallucinationEvaluator, OpenAIModel
model = OpenAIModel(model="gpt-4o-mini")
evaluator = HallucinationEvaluator(model)
Running Batch Evaluations
from phoenix.evals.legacy import run_evals, OpenAIModel
model = OpenAIModel(model="gpt-4o-mini")
# run_evals executes multiple evaluation configurations against a DataFrame
Related Pages
- Arize_ai_Phoenix_Evals_Public_API -- The top-level public API that re-exports this legacy module alongside the modern (v2) API.
- Arize_ai_Phoenix_HallucinationEvaluator -- Modern (v2) hallucination evaluator (deprecated in favor of FaithfulnessEvaluator).
- Arize_ai_Phoenix_CorrectnessEvaluator -- Modern (v2) correctness evaluator.
- Arize_ai_Phoenix_FaithfulnessEvaluator -- Modern (v2) faithfulness evaluator.