Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Arize ai Phoenix Legacy Classify

From Leeroopedia

LLM_Evaluation Data_Processing

Overview

The Legacy Classify module provides the core LLM-based classification framework for the Phoenix Evals subsystem. It implements llm_classify(), the primary function for applying large language model classifications to tabular data, and run_evals(), a batch evaluator orchestrator that applies multiple LLMEvaluator instances across a DataFrame in a single pass. The module also defines the ClassificationStatus enum for tracking the execution state of individual classification attempts.

llm_classify() supports both synchronous and asynchronous execution, OpenAI function calling for structured output, optional chain-of-thought explanations, and configurable concurrency with retry logic. It processes each row of an input DataFrame (or list) through a prompt template, sends the rendered prompt to an LLM, and parses the response into a classification label snapped to predefined "rails" (valid output classes).

run_evals() orchestrates multiple LLMEvaluator instances against a shared DataFrame, running all evaluator-record pairs concurrently and returning one result DataFrame per evaluator.

Code Reference

Attribute Details
Source File packages/phoenix-evals/src/phoenix/evals/legacy/classify.py
Repository Arize-ai/phoenix
Lines 521
Module phoenix.evals.legacy.classify
Key Symbols llm_classify(), run_evals(), ClassificationStatus
Dependencies pandas, phoenix.evals.legacy.evaluators, phoenix.evals.legacy.executors, phoenix.evals.legacy.models, phoenix.evals.legacy.templates, phoenix.evals.legacy.utils

I/O Contract

llm_classify()

Parameter Type Description
data Union[pd.DataFrame, List[Any]] Input data containing template variables. DataFrame columns or list elements are mapped to template placeholders.
model BaseModel An LLM model instance (e.g., OpenAIModel) used to generate classifications.
template Union[ClassificationTemplate, PromptTemplate, str] The prompt template defining the classification task.
rails List[str] Valid output labels the model response is snapped to.
data_processor Optional[Callable] Optional callable to preprocess each input row before template mapping.
system_instruction Optional[str] Optional system message prepended to the LLM prompt.
provide_explanation bool If True, adds an explanation column to output.
use_function_calling_if_available bool If True, uses OpenAI function calling to constrain outputs.
include_prompt bool If True, includes the rendered prompt in the output.
include_response bool If True, includes the raw LLM response in the output.
max_retries int Maximum retry attempts per classification (default: 10).
exit_on_error bool If True, halts on exhausted retries; otherwise continues.
run_sync bool If True, forces synchronous execution.
concurrency Optional[int] Number of concurrent async requests.
Returns pd.DataFrame DataFrame with columns: label, optionally explanation, prompt, response, plus exceptions, execution_status, execution_seconds, prompt_tokens, completion_tokens, total_tokens.

run_evals()

Parameter Type Description
dataframe DataFrame Input records to evaluate.
evaluators List[LLMEvaluator] List of evaluator instances to apply.
provide_explanation bool Whether to include explanations in output.
use_function_calling_if_available bool Whether to use OpenAI function calling.
concurrency Optional[int] Concurrent evaluation limit.
Returns List[DataFrame] One DataFrame per evaluator with label, score, and optionally explanation columns.

ClassificationStatus Enum

Value Description
DID_NOT_RUN Evaluation was not attempted.
COMPLETED Evaluation completed successfully on first attempt.
COMPLETED_WITH_RETRIES Evaluation completed after retrying.
FAILED Evaluation failed after exhausting all retries.
MISSING_INPUT Required template variables were missing from the input row.

Usage Examples

from phoenix.evals.legacy.classify import llm_classify, run_evals
from phoenix.evals.legacy.models import OpenAIModel
from phoenix.evals.legacy.default_templates import HALLUCINATION_PROMPT_TEMPLATE
import pandas as pd

# Basic classification with llm_classify
model = OpenAIModel(model="gpt-4")
df = pd.DataFrame({
    "input": ["What is Python?"],
    "reference": ["Python is a programming language."],
    "output": ["Python is a type of snake."],
})

result = llm_classify(
    data=df,
    model=model,
    template=HALLUCINATION_PROMPT_TEMPLATE,
    rails=["hallucinated", "factual"],
    provide_explanation=True,
)
# result contains columns: label, explanation, exceptions, execution_status, ...
from phoenix.evals.legacy.evaluators import HallucinationEvaluator, QAEvaluator

# Batch evaluation with run_evals
hallucination_eval = HallucinationEvaluator(model=model)
qa_eval = QAEvaluator(model=model)

results = run_evals(
    dataframe=df,
    evaluators=[hallucination_eval, qa_eval],
    provide_explanation=True,
)
# results[0] = hallucination DataFrame, results[1] = QA DataFrame

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment