Implementation:Arize ai Phoenix Legacy Audio Templates

Overview

The Legacy Audio Templates module defines the EMOTION_PROMPT_TEMPLATE, a multimodal ClassificationTemplate for classifying the primary emotion expressed in audio files. This is the first multimodal evaluation template in the Phoenix Evals subsystem, designed to work with LLMs that support audio input (e.g., models with audio understanding capabilities).

The template instructs the LLM to analyze audio characteristics including tone, pitch, pace, volume, and intensity to determine the dominant emotion. It supports a 10-class emotion taxonomy: anger, happiness, excitement, sadness, neutral, frustration, fear, surprise, disgust, and other.

The template is structured as a three-part multimodal prompt: a text preamble with classification instructions, an audio content part containing the {audio} variable (base64-encoded audio data), and a text epilogue with response format instructions. Both base and explanation variants follow this three-part structure.

Code Reference

Attribute	Details
Source File	`packages/phoenix-evals/src/phoenix/evals/legacy/default_audio_templates.py`
Repository	Arize-ai/phoenix
Lines	133
Module	`phoenix.evals.legacy.default_audio_templates`
Key Symbols	`EMOTION_PROMPT_TEMPLATE`, `EMOTION_AUDIO_RAILS`
Dependencies	`phoenix.evals.legacy.templates.ClassificationTemplate`, `phoenix.evals.legacy.templates.PromptPartContentType`, `phoenix.evals.legacy.templates.PromptPartTemplate`

I/O Contract

EMOTION_PROMPT_TEMPLATE

Attribute	Details
Type	`ClassificationTemplate`
Template Variables	`{audio}` - base64-encoded audio data
Rails	`["anger", "happiness", "excitement", "sadness", "neutral", "frustration", "fear", "surprise", "disgust", "other"]`
Prompt Structure	Three-part multimodal: TEXT (instructions) + AUDIO (data) + TEXT (response format)

Multimodal Template Parts (Base)

Part	Content Type	Description
Part 1	`PromptPartContentType.TEXT`	Task description, emotion taxonomy, and analysis criteria (tone, pitch, pace, volume, intensity).
Part 2	`PromptPartContentType.AUDIO`	The `{audio}` variable placeholder for base64-encoded audio content.
Part 3	`PromptPartContentType.TEXT`	Response format instructions and example response.

Multimodal Template Parts (Explanation)

Part	Content Type	Description
Part 1	`PromptPartContentType.TEXT`	Extended task description requesting step-by-step explanation of audio characteristics.
Part 2	`PromptPartContentType.AUDIO`	The `{audio}` variable placeholder (same as base template).
Part 3	`PromptPartContentType.TEXT`	Explanation response format with EXPLANATION/LABEL structure.

EMOTION_AUDIO_RAILS

The 10-class emotion taxonomy:

Index	Emotion Label	Description
0	`anger`	Hostile or aggressive emotional expression
1	`happiness`	Cheerful or joyful emotional expression
2	`excitement`	Enthusiastic or highly energized expression
3	`sadness`	Sorrowful or melancholic expression
4	`neutral`	Absence of strong emotional expression (used only when no other emotion is clear)
5	`frustration`	Annoyed or exasperated expression
6	`fear`	Anxious or frightened expression
7	`surprise`	Startled or astonished expression
8	`disgust`	Revulsion or strong disapproval
9	`other`	Emotion not captured by the above categories

Audio Analysis Criteria

The template instructs the LLM to evaluate five acoustic characteristics:

Criterion	Description	Example Values
Tone	General tone of the speaker	cheerful, tense, calm
Pitch	Level and variability of the pitch	high, low, monotone
Pace	Speed of speech	fast, slow, steady
Volume	Loudness of the speech	loud, soft, moderate
Intensity	Emotional strength or expression	subdued, sharp, exaggerated

Usage Examples

import base64
import pandas as pd
from phoenix.evals.legacy.default_audio_templates import EMOTION_PROMPT_TEMPLATE
from phoenix.evals.legacy.classify import llm_classify
from phoenix.evals.legacy.models import OpenAIModel

model = OpenAIModel(model="gpt-4o-audio-preview")

# Load audio file and encode to base64
with open("customer_call.wav", "rb") as f:
    audio_b64 = base64.b64encode(f.read()).decode("utf-8")

df = pd.DataFrame({"audio": [audio_b64]})

result = llm_classify(
    data=df,
    model=model,
    template=EMOTION_PROMPT_TEMPLATE,
    rails=[
        "anger", "happiness", "excitement", "sadness", "neutral",
        "frustration", "fear", "surprise", "disgust", "other",
    ],
    provide_explanation=True,
)
# result["label"] might be "frustration"
# result["explanation"] describes tone, pitch, pace, volume, intensity analysis

Related Pages

Arize_ai_Phoenix_Legacy_Templates - ClassificationTemplate, PromptPartTemplate, and PromptPartContentType base classes
Arize_ai_Phoenix_Legacy_Classify - llm_classify() function that processes multimodal templates
Arize_ai_Phoenix_Legacy_Default_Templates - Text-only classification templates for other evaluation tasks
Arize_ai_Phoenix_Legacy_Utils - get_audio_format_from_base64() for audio format detection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment