Implementation:Arize ai Phoenix ClassificationEvaluator Create
Appearance
| Knowledge Sources | |
|---|---|
| Domains | LLM Evaluation, Evaluator Architecture, Classification |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Concrete tools for defining and instantiating LLM-based and code-based evaluators provided by arize-phoenix-evals.
Description
This implementation covers the primary APIs for creating evaluation criteria in Phoenix:
- ClassificationEvaluator -- an LLM-based evaluator that constrains the model to select from a declared set of classification choices, optionally mapping each label to a numeric score and requesting an explanation.
- create_evaluator() -- a decorator that turns any Python function into a fully featured
Evaluatorinstance with automatic input schema generation and return-value-to-Score conversion. - create_classifier() -- a factory function that constructs a
ClassificationEvaluatorin a single call. - bind_evaluator() -- a helper that binds an evaluator with a fixed input mapping so that data columns with different names can be routed to the evaluator's expected fields.
Together, these APIs enable teams to build LLM-judged classification evaluators, deterministic code-based evaluators, and to adapt either kind to arbitrary data schemas without modifying the evaluator's core logic.
Usage
Use these APIs when you need to:
- Build a classification evaluator that asks an LLM to choose among predefined labels (e.g., sentiment, relevance, toxicity).
- Create a code-based evaluator from a plain function for deterministic checks (e.g., word count, regex validation, precision/recall).
- Map data columns to evaluator input fields when your DataFrame schema does not match the evaluator's expected variable names.
Code Reference
Source Location
- Repository: Phoenix
- File:
packages/phoenix-evals/src/phoenix/evals/evaluators.pyEvaluatorbase: lines 278-480LLMEvaluator: lines 484-566ClassificationEvaluator: lines 570-794create_evaluator: lines 797-1097create_classifier: lines 1101-1186bind_evaluator: lines 1191-1282
Signature: ClassificationEvaluator
class ClassificationEvaluator(LLMEvaluator):
def __init__(
self,
*,
name: str,
llm: LLM,
prompt_template: Union[PromptLike, PromptTemplate, Template],
choices: Union[
List[str],
Dict[str, Union[float, int]],
Dict[str, Tuple[Union[float, int], str]],
],
include_explanation: bool = True,
input_schema: Optional[type[BaseModel]] = None,
direction: DirectionType = "maximize",
**kwargs: Any,
) -> None
Signature: create_evaluator
def create_evaluator(
name: str,
source: Optional[KindType] = None,
direction: DirectionType = "maximize",
kind: Optional[KindType] = None,
) -> Callable[[Callable[..., Any]], Evaluator]
Signature: create_classifier
def create_classifier(
name: str,
prompt_template: str,
llm: LLM,
choices: Union[
List[str],
Dict[str, Union[float, int]],
Dict[str, Tuple[Union[float, int], str]],
],
direction: DirectionType = "maximize",
) -> ClassificationEvaluator
Signature: bind_evaluator
def bind_evaluator(
evaluator: Evaluator,
input_mapping: InputMappingType,
) -> Evaluator
Import
from phoenix.evals import (
ClassificationEvaluator,
create_evaluator,
create_classifier,
bind_evaluator,
)
I/O Contract
ClassificationEvaluator Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | str |
Yes | Identifier for the evaluator; also used as the Score.name.
|
| llm | LLM |
Yes | An initialized LLM instance with tool-calling or structured output support.
|
| prompt_template | Union[PromptLike, PromptTemplate, Template] |
Yes | Prompt with placeholder variables (e.g., {text}, {question}) that are filled from the input record.
|
| choices | Union[List[str], Dict[str, Union[float, int]], Dict[str, Tuple[Union[float, int], str]]] |
Yes | Classification labels. May be a list of strings, a dict mapping labels to numeric scores, or a dict mapping labels to (score, description) tuples. |
| include_explanation | bool |
No (default True) |
Whether to request the LLM to provide reasoning with its classification. |
| input_schema | Optional[type[BaseModel]] |
No | Pydantic model for explicit input validation. If omitted, a model is dynamically generated from prompt template variables. |
| direction | DirectionType |
No (default "maximize") |
Score optimization direction: "maximize" or "minimize".
|
| **kwargs | Any |
No | Invocation parameters forwarded to the LLM client during generation. |
create_evaluator Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | str |
Yes | Identifier for the evaluator and the produced Scores. |
| kind | Optional[KindType] |
No (default "code") |
Kind of evaluator: "human", "llm", or "code".
|
| direction | DirectionType |
No (default "maximize") |
Score optimization direction. |
bind_evaluator Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| evaluator | Evaluator |
Yes | The evaluator instance to bind with a mapping. |
| input_mapping | InputMappingType |
Yes | A dictionary mapping evaluator field names to data field names (strings) or callable transformations. |
Outputs
| API | Return Type | Description |
|---|---|---|
ClassificationEvaluator.__init__ |
ClassificationEvaluator |
An evaluator instance with evaluate() and async_evaluate() methods returning List[Score].
|
create_evaluator()(fn) |
Evaluator |
A decorated function wrapped as an Evaluator with evaluate(), async_evaluate(), and direct __call__.
|
create_classifier() |
ClassificationEvaluator |
Same as constructing ClassificationEvaluator directly.
|
bind_evaluator() |
Evaluator |
A shallow copy of the evaluator with the input mapping bound. |
Usage Examples
LLM Classification with Label-to-Score Mapping
from phoenix.evals import ClassificationEvaluator, LLM
llm = LLM(provider="openai", model="gpt-4o")
evaluator = ClassificationEvaluator(
name="relevance",
llm=llm,
prompt_template=(
"Given the following question and answer, rate the relevance.\n"
"Question: {question}\n"
"Answer: {answer}"
),
choices={
"highly_relevant": 1.0,
"somewhat_relevant": 0.5,
"not_relevant": 0.0,
},
include_explanation=True,
)
result = evaluator.evaluate({
"question": "What is the capital of France?",
"answer": "Paris is the capital city of France.",
})
print(result[0].label) # "highly_relevant"
print(result[0].score) # 1.0
print(result[0].explanation) # LLM reasoning
Code-Based Evaluator with create_evaluator
from phoenix.evals import create_evaluator
@create_evaluator(name="word_count")
def word_count(text: str) -> int:
return len(text.split())
# As an Evaluator
result = word_count.evaluate({"text": "Hello world"})
print(result[0].score) # 2
# Direct function call still works
print(word_count(text="Hello world")) # 2
Quick Classifier via Factory
from phoenix.evals import create_classifier, LLM
llm = LLM(provider="openai", model="gpt-4o")
sentiment = create_classifier(
name="sentiment",
prompt_template="Classify the sentiment of: {text}",
llm=llm,
choices=["positive", "negative", "neutral"],
)
result = sentiment.evaluate({"text": "I love this product!"})
print(result[0].label) # "positive"
Binding Input Mappings
from phoenix.evals import create_evaluator, bind_evaluator
@create_evaluator(name="response_length")
def response_length(response: str) -> int:
return len(response)
# DataFrame has column "answer" but evaluator expects "response"
bound = bind_evaluator(
evaluator=response_length,
input_mapping={"response": "answer"},
)
result = bound.evaluate({"answer": "Paris is the capital of France."})
print(result[0].score) # 31
Classification with Descriptions (Advanced)
from phoenix.evals import ClassificationEvaluator, LLM
llm = LLM(provider="openai", model="gpt-4o")
evaluator = ClassificationEvaluator(
name="factual_accuracy",
llm=llm,
prompt_template="Evaluate the factual accuracy of: {claim}",
choices={
"accurate": (1.0, "Factually correct information"),
"partially_accurate": (0.5, "Some correct, some incorrect information"),
"inaccurate": (0.0, "Factually incorrect information"),
},
)
result = evaluator.evaluate({"claim": "The Earth orbits the Sun."})
print(result[0].label) # "accurate"
print(result[0].score) # 1.0
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment