LLM_Evaluation Audio_Analysis
Overview
The Legacy Audio Templates module defines the EMOTION_PROMPT_TEMPLATE, a multimodal ClassificationTemplate for classifying the primary emotion expressed in audio files. This is the first multimodal evaluation template in the Phoenix Evals subsystem, designed to work with LLMs that support audio input (e.g., models with audio understanding capabilities).
The template instructs the LLM to analyze audio characteristics including tone, pitch, pace, volume, and intensity to determine the dominant emotion. It supports a 10-class emotion taxonomy: anger, happiness, excitement, sadness, neutral, frustration, fear, surprise, disgust, and other.
The template is structured as a three-part multimodal prompt: a text preamble with classification instructions, an audio content part containing the {audio} variable (base64-encoded audio data), and a text epilogue with response format instructions. Both base and explanation variants follow this three-part structure.
Code Reference
| Attribute |
Details
|
| Source File |
packages/phoenix-evals/src/phoenix/evals/legacy/default_audio_templates.py
|
| Repository |
Arize-ai/phoenix
|
| Lines |
133
|
| Module |
phoenix.evals.legacy.default_audio_templates
|
| Key Symbols |
EMOTION_PROMPT_TEMPLATE, EMOTION_AUDIO_RAILS
|
| Dependencies |
phoenix.evals.legacy.templates.ClassificationTemplate, phoenix.evals.legacy.templates.PromptPartContentType, phoenix.evals.legacy.templates.PromptPartTemplate
|
I/O Contract
EMOTION_PROMPT_TEMPLATE
| Attribute |
Details
|
| Type |
ClassificationTemplate
|
| Template Variables |
{audio} - base64-encoded audio data
|
| Rails |
["anger", "happiness", "excitement", "sadness", "neutral", "frustration", "fear", "surprise", "disgust", "other"]
|
| Prompt Structure |
Three-part multimodal: TEXT (instructions) + AUDIO (data) + TEXT (response format)
|
Multimodal Template Parts (Base)
| Part |
Content Type |
Description
|
| Part 1 |
PromptPartContentType.TEXT |
Task description, emotion taxonomy, and analysis criteria (tone, pitch, pace, volume, intensity).
|
| Part 2 |
PromptPartContentType.AUDIO |
The {audio} variable placeholder for base64-encoded audio content.
|
| Part 3 |
PromptPartContentType.TEXT |
Response format instructions and example response.
|
Multimodal Template Parts (Explanation)
| Part |
Content Type |
Description
|
| Part 1 |
PromptPartContentType.TEXT |
Extended task description requesting step-by-step explanation of audio characteristics.
|
| Part 2 |
PromptPartContentType.AUDIO |
The {audio} variable placeholder (same as base template).
|
| Part 3 |
PromptPartContentType.TEXT |
Explanation response format with EXPLANATION/LABEL structure.
|
EMOTION_AUDIO_RAILS
The 10-class emotion taxonomy:
| Index |
Emotion Label |
Description
|
| 0 |
anger |
Hostile or aggressive emotional expression
|
| 1 |
happiness |
Cheerful or joyful emotional expression
|
| 2 |
excitement |
Enthusiastic or highly energized expression
|
| 3 |
sadness |
Sorrowful or melancholic expression
|
| 4 |
neutral |
Absence of strong emotional expression (used only when no other emotion is clear)
|
| 5 |
frustration |
Annoyed or exasperated expression
|
| 6 |
fear |
Anxious or frightened expression
|
| 7 |
surprise |
Startled or astonished expression
|
| 8 |
disgust |
Revulsion or strong disapproval
|
| 9 |
other |
Emotion not captured by the above categories
|
Audio Analysis Criteria
The template instructs the LLM to evaluate five acoustic characteristics:
| Criterion |
Description |
Example Values
|
| Tone |
General tone of the speaker |
cheerful, tense, calm
|
| Pitch |
Level and variability of the pitch |
high, low, monotone
|
| Pace |
Speed of speech |
fast, slow, steady
|
| Volume |
Loudness of the speech |
loud, soft, moderate
|
| Intensity |
Emotional strength or expression |
subdued, sharp, exaggerated
|
Usage Examples
import base64
import pandas as pd
from phoenix.evals.legacy.default_audio_templates import EMOTION_PROMPT_TEMPLATE
from phoenix.evals.legacy.classify import llm_classify
from phoenix.evals.legacy.models import OpenAIModel
model = OpenAIModel(model="gpt-4o-audio-preview")
# Load audio file and encode to base64
with open("customer_call.wav", "rb") as f:
audio_b64 = base64.b64encode(f.read()).decode("utf-8")
df = pd.DataFrame({"audio": [audio_b64]})
result = llm_classify(
data=df,
model=model,
template=EMOTION_PROMPT_TEMPLATE,
rails=[
"anger", "happiness", "excitement", "sadness", "neutral",
"frustration", "fear", "surprise", "disgust", "other",
],
provide_explanation=True,
)
# result["label"] might be "frustration"
# result["explanation"] describes tone, pitch, pace, volume, intensity analysis
Related Pages