Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Arize ai Phoenix Legacy Public API

From Leeroopedia

Overview

The Legacy Public API is the __init__.py module of the phoenix.evals.legacy subpackage. It serves as the entry point for the v1 evaluation system in the arize-phoenix-evals package, re-exporting all legacy evaluation functions, model classes, evaluator wrappers, template constants, and utility functions that formed the original phoenix-evals interface.

Description

This module consolidates the legacy (v1) API of phoenix-evals into a single import namespace. The v1 system was based on direct LLM classification of DataFrames using template strings and rail maps, in contrast to the modern (v2) API which uses the Evaluator / ClassificationEvaluator class hierarchy.

The legacy module re-exports from the following internal submodules:

  • phoenix.evals.legacy.classify -- Core classification functions (llm_classify, run_evals).
  • phoenix.evals.legacy.generate -- LLM generation function (llm_generate).
  • phoenix.evals.legacy.evaluators -- Pre-built evaluator classes wrapping common evaluation patterns.
  • phoenix.evals.legacy.models -- Model wrapper classes for various LLM providers.
  • phoenix.evals.legacy.default_templates -- Prompt template constants for built-in evaluation types.
  • phoenix.evals.legacy.span_templates -- Span-level prompt templates.
  • phoenix.evals.legacy.templates -- Template utility classes.
  • phoenix.evals.legacy.retrievals -- Retrieval metric computation (compute_precisions_at_k).
  • phoenix.evals.legacy.utils -- Utility constants and dataset download functions.

The module also exposes the package version via __version__.

Usage

from phoenix.evals.legacy import *

Or import specific symbols:

from phoenix.evals.legacy import llm_classify, OpenAIModel, HALLUCINATION_PROMPT_TEMPLATE

Code Reference

Property Value
Source File packages/phoenix-evals/src/phoenix/evals/legacy/__init__.py
Module phoenix.evals.legacy
Lines ~155
Package arize-phoenix-evals
Domain API Surface, Legacy

Exported Symbols

Core Functions

Symbol Source Module Description
llm_classify legacy.classify Classifies rows of a DataFrame using an LLM with a prompt template and rail map. Central function of the v1 evaluation system.
run_evals legacy.classify Runs multiple evaluation templates against a DataFrame in batch.
llm_generate legacy.generate Generates freeform LLM responses for rows in a DataFrame.
compute_precisions_at_k legacy.retrievals Computes precision@K metrics for retrieval evaluation.
download_benchmark_dataset legacy.utils Downloads standard benchmark datasets for evaluation.

Model Classes

All model classes are imported from legacy.models:

Symbol Description
OpenAIModel Wrapper for OpenAI API models.
AnthropicModel Wrapper for Anthropic API models.
GeminiModel Wrapper for Google Gemini models.
GoogleGenAIModel Wrapper for Google GenAI models.
VertexAIModel Wrapper for Google Vertex AI models.
BedrockModel Wrapper for AWS Bedrock models.
LiteLLMModel Wrapper for the LiteLLM unified interface.
MistralAIModel Wrapper for Mistral AI models.

Evaluator Classes

Pre-built evaluator wrappers from legacy.evaluators:

Symbol Description
LLMEvaluator Base legacy LLM evaluator class.
HallucinationEvaluator Legacy hallucination detection evaluator.
QAEvaluator Legacy question-answering quality evaluator.
RelevanceEvaluator Legacy relevance evaluator.
SQLEvaluator Legacy SQL generation evaluation evaluator.
SummarizationEvaluator Legacy summarization quality evaluator.
ToxicityEvaluator Legacy toxicity detection evaluator.

Template Classes

From legacy.templates:

Symbol Description
PromptTemplate Template class for constructing evaluation prompts with variable substitution.
ClassificationTemplate Specialized template class for classification evaluation prompts.

Prompt Template Constants

All constants are imported from legacy.default_templates. Each evaluation type provides up to four variants:

Variant Suffix Description
_BASE_TEMPLATE The core prompt text without rails or explanation directives.
_RAILS_MAP A dictionary mapping output labels to descriptions for classification.
_TEMPLATE The standard prompt template ready for use with llm_classify.
_TEMPLATE_WITH_EXPLANATION A variant that requests the LLM to include reasoning.

Evaluation types and their constant prefixes:

Evaluation Type Prefix
Hallucination Detection HALLUCINATION_PROMPT_
Question Answering (QA) QA_PROMPT_
RAG Relevancy RAG_RELEVANCY_PROMPT_
Code Functionality CODE_FUNCTIONALITY_PROMPT_
Code Readability CODE_READABILITY_PROMPT_
Human vs. AI HUMAN_VS_AI_PROMPT_
Reference Link Correctness REFERENCE_LINK_CORRECTNESS_PROMPT_
SQL Generation SQL_GEN_EVAL_PROMPT_
Tool Calling TOOL_CALLING_
Toxicity TOXICITY_PROMPT_
User Frustration USER_FRUSTRATION_PROMPT_

Span-Level Templates

From legacy.span_templates:

Symbol Description
HALLUCINATION_SPAN_PROMPT_TEMPLATE Span-level hallucination evaluation prompt template.
QA_SPAN_PROMPT_TEMPLATE Span-level QA evaluation prompt template.
TOOL_CALLING_SPAN_PROMPT_TEMPLATE Span-level tool calling evaluation prompt template.

Other Constants

Symbol Source Description
NOT_PARSABLE legacy.utils Sentinel string value returned when LLM output cannot be parsed into a valid classification label.

I/O Contract

This module is a re-export surface and does not define its own I/O contract. Each re-exported symbol has its own I/O contract in its respective source module.

The key I/O pattern for the legacy system is:

# Input: pandas DataFrame + model + template + rails
results_df = llm_classify(
    dataframe=df,           # pd.DataFrame with columns referenced in the template
    model=model,            # One of the Model classes (OpenAIModel, etc.)
    template=template,      # A prompt template string or PromptTemplate
    rails=rails_list,       # List of valid classification labels
    provide_explanation=True # Optional: include LLM explanations
)
# Output: pandas DataFrame with 'label' and optionally 'explanation' columns

Usage Examples

Legacy Classification with llm_classify

import pandas as pd
from phoenix.evals.legacy import (
    OpenAIModel,
    llm_classify,
    HALLUCINATION_PROMPT_TEMPLATE,
    HALLUCINATION_PROMPT_RAILS_MAP,
)

model = OpenAIModel(model="gpt-4o-mini")
df = pd.DataFrame({
    "input": ["What is the capital of France?"],
    "output": ["Paris is the capital of France."],
    "context": ["Paris is the capital and largest city of France."],
})

results = llm_classify(
    dataframe=df,
    model=model,
    template=HALLUCINATION_PROMPT_TEMPLATE,
    rails=list(HALLUCINATION_PROMPT_RAILS_MAP.values()),
)

Using Legacy Evaluator Wrappers

from phoenix.evals.legacy import HallucinationEvaluator, OpenAIModel

model = OpenAIModel(model="gpt-4o-mini")
evaluator = HallucinationEvaluator(model)

Running Batch Evaluations

from phoenix.evals.legacy import run_evals, OpenAIModel

model = OpenAIModel(model="gpt-4o-mini")
# run_evals executes multiple evaluation configurations against a DataFrame

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment