Implementation:Arize ai Phoenix Experiment Task Interface

Knowledge Sources	Phoenix
Domains	AI Observability, Experiment Design, Evaluation Infrastructure
Last Updated	2026-02-14 00:00 GMT

Overview

Concrete pattern for defining task functions in Phoenix experiments, using dynamic parameter binding to automatically inject dataset example fields into user-defined callables.

Description

The Phoenix experiment task interface defines how user-written functions are connected to dataset examples during experiment execution. A task is any callable (synchronous or asynchronous) that accepts parameters matching dataset example field names and returns a JSON-serializable value. The framework uses Python's inspect module to examine the function signature at runtime and automatically bind parameters to the corresponding example fields.

The task interface supports two usage modes:

Single-parameter mode: When a function has exactly one parameter, it is bound to the input field of the dataset example, regardless of the parameter name. This provides a convenient shorthand for the most common task pattern.
Multi-parameter mode: When a function has multiple parameters, each parameter name is matched against the set of known binding names (input, expected, reference, metadata, example). Parameters with default values or **kwargs are permitted but not required to match known names.

The framework deep-copies all bound values before passing them to the task function, ensuring isolation between task executions. The task output must be JSON-serializable (dict, list, str, int, float, bool, or None) to be stored in the Phoenix database.

Usage

Use this pattern when defining the "system under test" for a Phoenix experiment. The task function should encapsulate the complete processing pipeline from input to output, whether that involves calling an LLM API, running a retrieval pipeline, or executing deterministic logic.

Code Reference

Source Location

Repository: Phoenix
File: packages/phoenix-client/src/phoenix/client/experiments/__init__.py (lines 17-204)
Types: packages/phoenix-client/src/phoenix/client/resources/experiments/types.py (lines 154-160)

Type Definition

ExperimentTask = Union[
    Callable[[v1.DatasetExample], TaskOutput],
    Callable[[v1.DatasetExample], Awaitable[TaskOutput]],
    Callable[..., JSONSerializable],
    Callable[..., Awaitable[JSONSerializable]],
]

Parameter Binding Names

# Available binding names for task parameters
valid_named_params = {"input", "expected", "reference", "metadata", "example"}

# Binding rules:
# - Single-arg function: parameter bound to "input" field
# - Multi-arg function: each parameter name matched to corresponding field
# - "reference" is an alias for "expected"
# - "example" provides the full DatasetExample object (wrapped as ExampleProxy)

Import

from phoenix.client.experiments import run_experiment
from phoenix.client import Client

# Tasks are user-defined functions; no special import needed
def my_task(input):
    return process(input)

I/O Contract

Inputs

Name	Type	Required	Description
input	Mapping[str, Any]	Yes (auto-bound)	The input field of the dataset example. A dictionary of key-value pairs representing the test input.
expected	Mapping[str, Any]	No (auto-bound)	The expected or reference output from the dataset example. Contains the ground truth for comparison.
reference	Mapping[str, Any]	No (auto-bound)	Alias for expected. Provides the same value under a more intuitive name.
metadata	Mapping[str, Any]	No (auto-bound)	Metadata associated with the dataset example. Contains auxiliary context information.
example	ExampleProxy	No (auto-bound)	The complete dataset Example object with all fields accessible via both attribute and dictionary access.

Outputs

Name	Type	Description
output	JSONSerializable	The task return value. Must be JSON-serializable: dict, list, str, int, float, bool, or None. Stored as the experiment run output.

Usage Examples

Simple Single-Parameter Task

from phoenix.client import Client
from phoenix.client.experiments import run_experiment

client = Client()
dataset = client.datasets.get_dataset(dataset="qa-benchmark")

# Single parameter is automatically bound to "input"
def my_task(input):
    question = input["question"]
    return f"The answer to '{question}' is 42."

experiment = run_experiment(
    dataset=dataset,
    task=my_task,
    experiment_name="simple-task",
)

Task with Multiple Parameters

# Multi-parameter task: each name matched to corresponding field
def context_aware_task(input, metadata, expected):
    question = input["question"]
    context = metadata.get("context", "")
    reference = expected.get("answer", "")
    return {
        "answer": f"Based on context: {context}, answering: {question}",
        "reference_length": len(reference),
    }

experiment = run_experiment(
    dataset=dataset,
    task=context_aware_task,
    experiment_name="context-task",
)

Task Using Full Example Object

# Access the full example object for maximum flexibility
def full_example_task(example):
    print(f"Processing example {example.id}")
    question = example.input["question"]
    expected_answer = example.output.get("answer", "")
    category = example.metadata.get("category", "general")
    return {
        "answer": generate_answer(question, category),
        "example_id": example.id,
    }

experiment = run_experiment(
    dataset=dataset,
    task=full_example_task,
    experiment_name="full-example-task",
)

Async Task for LLM Calls

import openai

async_client = openai.AsyncOpenAI()

# Async tasks enable concurrent execution across examples
async def llm_task(input):
    response = await async_client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "user", "content": input["question"]},
        ],
    )
    return response.choices[0].message.content

# Use async_run_experiment for async tasks
from phoenix.client.experiments import async_run_experiment

experiment = await async_run_experiment(
    dataset=dataset,
    task=llm_task,
    experiment_name="llm-task",
    concurrency=5,
)

Task with Reference Alias

# "reference" is an alias for "expected"
def comparison_task(input, reference):
    question = input["question"]
    ref_answer = reference.get("answer", "")
    generated = generate_answer(question)
    return {
        "generated": generated,
        "matches_reference": generated == ref_answer,
    }

experiment = run_experiment(
    dataset=dataset,
    task=comparison_task,
    experiment_name="comparison-task",
)

Related Pages

Implements Principle

Principle:Arize_ai_Phoenix_Experiment_Task_Definition

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment