Implementation:Arize ai Phoenix Legacy Utils

Overview

The Legacy Utils module provides utility functions for the Phoenix Evals subsystem, covering dataset handling, output parsing, media format detection, and progress tracking. These utilities are consumed throughout the classification and generation pipelines.

The module includes download_benchmark_dataset() for fetching Arize evaluation benchmark datasets from cloud storage, snap_to_rail() for normalizing raw LLM output strings to the nearest valid classification label, parse_openai_function_call() and openai_function_call_kwargs() for handling OpenAI function calling output format, get_audio_format_from_base64() and get_image_format_from_base64() for detecting media formats from base64-encoded data via file signature inspection, and get_tqdm_progress_bar_formatter() for consistent progress bar formatting.

The module also defines the NOT_PARSABLE sentinel string used when LLM output cannot be mapped to any valid rail, and constants for SUPPORTED_AUDIO_FORMATS and SUPPORTED_IMAGE_FORMATS.

Code Reference

Attribute	Details
Source File	`packages/phoenix-evals/src/phoenix/evals/legacy/utils.py`
Repository	Arize-ai/phoenix
Lines	330
Module	`phoenix.evals.legacy.utils`
Key Symbols	`download_benchmark_dataset()`, `snap_to_rail()`, `parse_openai_function_call()`, `openai_function_call_kwargs()`, `get_audio_format_from_base64()`, `get_image_format_from_base64()`, `get_tqdm_progress_bar_formatter()`, `printif()`, `emoji_guard()`, `NOT_PARSABLE`
Dependencies	`pandas`, `tqdm`, `base64`, `json`

I/O Contract

download_benchmark_dataset()

Parameter	Type	Description
`task`	`str`	Evaluation task name (used in the storage URL path).
`dataset_name`	`str`	Name of the benchmark dataset.
Returns	`pd.DataFrame`	DataFrame loaded from a JSONL file within a ZIP archive fetched from Google Cloud Storage.
Raises	`ValueError`	If the dataset does not exist at the expected URL.

snap_to_rail()

Parameter	Type	Description
`raw_string`	`Optional[str]`	The raw LLM output to be matched against rails.
`rails`	`List[str]`	Valid output labels to snap to.
`verbose`	`bool`	If True, prints debug information about snapping.
Returns	`str`	The matched rail string, or `"NOT_PARSABLE"` if exactly one rail is not found.

The function performs case-insensitive matching and checks that exactly one rail is found in the input string. Rails are sorted by length (longest first) to avoid substring conflicts.

parse_openai_function_call()

Parameter	Type	Description
`raw_output`	`str`	Raw JSON output from an OpenAI function call.
Returns	`Tuple[str, Optional[str]]`	Tuple of (unrailed_label, optional_explanation). Falls back to (raw_output, None) on JSON parse failure.

openai_function_call_kwargs()

Parameter	Type	Description
`rails`	`List[str]`	Valid classification labels.
`provide_explanation`	`bool`	Whether to include an explanation field in the function schema.
Returns	`Dict[str, Any]`	Dictionary with `functions` and `function_call` keys for OpenAI API invocation.

The generated function schema is named "record_response" and constrains the response field to the provided rails via an enum.

get_audio_format_from_base64()

Parameter	Type	Description
`enc_str`	`str`	Base64-encoded audio data.
Returns	`Literal["mp3", "wav", "ogg", "flac", "m4a", "aac"]`	Detected audio format based on file signature (magic bytes).
Raises	`ValueError`	If the format cannot be determined or data is too short.

get_image_format_from_base64()

Parameter	Type	Description
`enc_str`	`str`	Base64-encoded image data.
Returns	`Literal["png", "jpeg", "jpg", "webp", "heic", "heif", "bmp", "gif", "tiff", "ico"]`	Detected image format based on file signature (magic bytes).
Raises	`ValueError`	If the format cannot be determined or data is too short.

Helper Functions

Function	Description
`get_tqdm_progress_bar_formatter(title)`	Returns a tqdm `bar_format` string with the given title prefix, elapsed/remaining time, and rate display.
`printif(condition, args, *kwargs)`	Conditionally prints via `tqdm.write()` only when `condition` is True.
`emoji_guard(emoji, fallback)`	Returns the emoji string on non-Windows systems, or the fallback on Windows (to avoid encoding issues).

Constants

Constant	Value	Description
`NOT_PARSABLE`	`"NOT_PARSABLE"`	Sentinel returned when LLM output cannot be snapped to any rail.
`SUPPORTED_AUDIO_FORMATS`	`{"mp3", "wav"}`	Set of supported audio format identifiers.
`SUPPORTED_IMAGE_FORMATS`	`{"png", "jpeg", "jpg", "webp", "heic", "heif", "bmp", "gif", "tiff", "ico"}`	Set of supported image format identifiers.

Usage Examples

from phoenix.evals.legacy.utils import (
    download_benchmark_dataset,
    snap_to_rail,
    parse_openai_function_call,
    openai_function_call_kwargs,
    get_audio_format_from_base64,
)

# Download a benchmark dataset
df = download_benchmark_dataset(
    task="relevance",
    dataset_name="wiki_qa-train",
)

# Snap raw LLM output to a rail
label = snap_to_rail(
    "The answer is clearly relevant to the question.",
    rails=["relevant", "unrelated"],
)
# label = "relevant"

# Handle unparsable output
label = snap_to_rail("I'm not sure", rails=["relevant", "unrelated"])
# label = "NOT_PARSABLE"

# Parse OpenAI function call output
import json
raw = json.dumps({"response": "factual", "explanation": "The facts match."})
label, explanation = parse_openai_function_call(raw)
# label = "factual", explanation = "The facts match."

# Generate function calling kwargs
kwargs = openai_function_call_kwargs(
    rails=["factual", "hallucinated"],
    provide_explanation=True,
)
# kwargs = {"functions": [...], "function_call": {"name": "record_response"}}

Related Pages

Arize_ai_Phoenix_Legacy_Classify - Primary consumer of snap_to_rail() and function call utilities
Arize_ai_Phoenix_Legacy_Generate - Uses progress bar formatter
Arize_ai_Phoenix_Legacy_Evaluators - Uses snap_to_rail() and parse_openai_function_call()
Arize_ai_Phoenix_Legacy_Audio_Templates - Audio templates requiring get_audio_format_from_base64()

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment