Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Arize ai Phoenix Legacy Utils

From Leeroopedia

LLM_Evaluation Utilities

Overview

The Legacy Utils module provides utility functions for the Phoenix Evals subsystem, covering dataset handling, output parsing, media format detection, and progress tracking. These utilities are consumed throughout the classification and generation pipelines.

The module includes download_benchmark_dataset() for fetching Arize evaluation benchmark datasets from cloud storage, snap_to_rail() for normalizing raw LLM output strings to the nearest valid classification label, parse_openai_function_call() and openai_function_call_kwargs() for handling OpenAI function calling output format, get_audio_format_from_base64() and get_image_format_from_base64() for detecting media formats from base64-encoded data via file signature inspection, and get_tqdm_progress_bar_formatter() for consistent progress bar formatting.

The module also defines the NOT_PARSABLE sentinel string used when LLM output cannot be mapped to any valid rail, and constants for SUPPORTED_AUDIO_FORMATS and SUPPORTED_IMAGE_FORMATS.

Code Reference

Attribute Details
Source File packages/phoenix-evals/src/phoenix/evals/legacy/utils.py
Repository Arize-ai/phoenix
Lines 330
Module phoenix.evals.legacy.utils
Key Symbols download_benchmark_dataset(), snap_to_rail(), parse_openai_function_call(), openai_function_call_kwargs(), get_audio_format_from_base64(), get_image_format_from_base64(), get_tqdm_progress_bar_formatter(), printif(), emoji_guard(), NOT_PARSABLE
Dependencies pandas, tqdm, base64, json

I/O Contract

download_benchmark_dataset()

Parameter Type Description
task str Evaluation task name (used in the storage URL path).
dataset_name str Name of the benchmark dataset.
Returns pd.DataFrame DataFrame loaded from a JSONL file within a ZIP archive fetched from Google Cloud Storage.
Raises ValueError If the dataset does not exist at the expected URL.

snap_to_rail()

Parameter Type Description
raw_string Optional[str] The raw LLM output to be matched against rails.
rails List[str] Valid output labels to snap to.
verbose bool If True, prints debug information about snapping.
Returns str The matched rail string, or "NOT_PARSABLE" if exactly one rail is not found.

The function performs case-insensitive matching and checks that exactly one rail is found in the input string. Rails are sorted by length (longest first) to avoid substring conflicts.

parse_openai_function_call()

Parameter Type Description
raw_output str Raw JSON output from an OpenAI function call.
Returns Tuple[str, Optional[str]] Tuple of (unrailed_label, optional_explanation). Falls back to (raw_output, None) on JSON parse failure.

openai_function_call_kwargs()

Parameter Type Description
rails List[str] Valid classification labels.
provide_explanation bool Whether to include an explanation field in the function schema.
Returns Dict[str, Any] Dictionary with functions and function_call keys for OpenAI API invocation.

The generated function schema is named "record_response" and constrains the response field to the provided rails via an enum.

get_audio_format_from_base64()

Parameter Type Description
enc_str str Base64-encoded audio data.
Returns Literal["mp3", "wav", "ogg", "flac", "m4a", "aac"] Detected audio format based on file signature (magic bytes).
Raises ValueError If the format cannot be determined or data is too short.

get_image_format_from_base64()

Parameter Type Description
enc_str str Base64-encoded image data.
Returns Literal["png", "jpeg", "jpg", "webp", "heic", "heif", "bmp", "gif", "tiff", "ico"] Detected image format based on file signature (magic bytes).
Raises ValueError If the format cannot be determined or data is too short.

Helper Functions

Function Description
get_tqdm_progress_bar_formatter(title) Returns a tqdm bar_format string with the given title prefix, elapsed/remaining time, and rate display.
printif(condition, *args, **kwargs) Conditionally prints via tqdm.write() only when condition is True.
emoji_guard(emoji, fallback) Returns the emoji string on non-Windows systems, or the fallback on Windows (to avoid encoding issues).

Constants

Constant Value Description
NOT_PARSABLE "NOT_PARSABLE" Sentinel returned when LLM output cannot be snapped to any rail.
SUPPORTED_AUDIO_FORMATS {"mp3", "wav"} Set of supported audio format identifiers.
SUPPORTED_IMAGE_FORMATS {"png", "jpeg", "jpg", "webp", "heic", "heif", "bmp", "gif", "tiff", "ico"} Set of supported image format identifiers.

Usage Examples

from phoenix.evals.legacy.utils import (
    download_benchmark_dataset,
    snap_to_rail,
    parse_openai_function_call,
    openai_function_call_kwargs,
    get_audio_format_from_base64,
)

# Download a benchmark dataset
df = download_benchmark_dataset(
    task="relevance",
    dataset_name="wiki_qa-train",
)

# Snap raw LLM output to a rail
label = snap_to_rail(
    "The answer is clearly relevant to the question.",
    rails=["relevant", "unrelated"],
)
# label = "relevant"

# Handle unparsable output
label = snap_to_rail("I'm not sure", rails=["relevant", "unrelated"])
# label = "NOT_PARSABLE"
# Parse OpenAI function call output
import json
raw = json.dumps({"response": "factual", "explanation": "The facts match."})
label, explanation = parse_openai_function_call(raw)
# label = "factual", explanation = "The facts match."

# Generate function calling kwargs
kwargs = openai_function_call_kwargs(
    rails=["factual", "hallucinated"],
    provide_explanation=True,
)
# kwargs = {"functions": [...], "function_call": {"name": "record_response"}}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment