Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Arize ai Phoenix Legacy Audio Templates

From Leeroopedia
Revision as of 12:04, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Arize_ai_Phoenix_Legacy_Audio_Templates.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

LLM_Evaluation Audio_Analysis

Overview

The Legacy Audio Templates module defines the EMOTION_PROMPT_TEMPLATE, a multimodal ClassificationTemplate for classifying the primary emotion expressed in audio files. This is the first multimodal evaluation template in the Phoenix Evals subsystem, designed to work with LLMs that support audio input (e.g., models with audio understanding capabilities).

The template instructs the LLM to analyze audio characteristics including tone, pitch, pace, volume, and intensity to determine the dominant emotion. It supports a 10-class emotion taxonomy: anger, happiness, excitement, sadness, neutral, frustration, fear, surprise, disgust, and other.

The template is structured as a three-part multimodal prompt: a text preamble with classification instructions, an audio content part containing the {audio} variable (base64-encoded audio data), and a text epilogue with response format instructions. Both base and explanation variants follow this three-part structure.

Code Reference

Attribute Details
Source File packages/phoenix-evals/src/phoenix/evals/legacy/default_audio_templates.py
Repository Arize-ai/phoenix
Lines 133
Module phoenix.evals.legacy.default_audio_templates
Key Symbols EMOTION_PROMPT_TEMPLATE, EMOTION_AUDIO_RAILS
Dependencies phoenix.evals.legacy.templates.ClassificationTemplate, phoenix.evals.legacy.templates.PromptPartContentType, phoenix.evals.legacy.templates.PromptPartTemplate

I/O Contract

EMOTION_PROMPT_TEMPLATE

Attribute Details
Type ClassificationTemplate
Template Variables {audio} - base64-encoded audio data
Rails ["anger", "happiness", "excitement", "sadness", "neutral", "frustration", "fear", "surprise", "disgust", "other"]
Prompt Structure Three-part multimodal: TEXT (instructions) + AUDIO (data) + TEXT (response format)

Multimodal Template Parts (Base)

Part Content Type Description
Part 1 PromptPartContentType.TEXT Task description, emotion taxonomy, and analysis criteria (tone, pitch, pace, volume, intensity).
Part 2 PromptPartContentType.AUDIO The {audio} variable placeholder for base64-encoded audio content.
Part 3 PromptPartContentType.TEXT Response format instructions and example response.

Multimodal Template Parts (Explanation)

Part Content Type Description
Part 1 PromptPartContentType.TEXT Extended task description requesting step-by-step explanation of audio characteristics.
Part 2 PromptPartContentType.AUDIO The {audio} variable placeholder (same as base template).
Part 3 PromptPartContentType.TEXT Explanation response format with EXPLANATION/LABEL structure.

EMOTION_AUDIO_RAILS

The 10-class emotion taxonomy:

Index Emotion Label Description
0 anger Hostile or aggressive emotional expression
1 happiness Cheerful or joyful emotional expression
2 excitement Enthusiastic or highly energized expression
3 sadness Sorrowful or melancholic expression
4 neutral Absence of strong emotional expression (used only when no other emotion is clear)
5 frustration Annoyed or exasperated expression
6 fear Anxious or frightened expression
7 surprise Startled or astonished expression
8 disgust Revulsion or strong disapproval
9 other Emotion not captured by the above categories

Audio Analysis Criteria

The template instructs the LLM to evaluate five acoustic characteristics:

Criterion Description Example Values
Tone General tone of the speaker cheerful, tense, calm
Pitch Level and variability of the pitch high, low, monotone
Pace Speed of speech fast, slow, steady
Volume Loudness of the speech loud, soft, moderate
Intensity Emotional strength or expression subdued, sharp, exaggerated

Usage Examples

import base64
import pandas as pd
from phoenix.evals.legacy.default_audio_templates import EMOTION_PROMPT_TEMPLATE
from phoenix.evals.legacy.classify import llm_classify
from phoenix.evals.legacy.models import OpenAIModel

model = OpenAIModel(model="gpt-4o-audio-preview")

# Load audio file and encode to base64
with open("customer_call.wav", "rb") as f:
    audio_b64 = base64.b64encode(f.read()).decode("utf-8")

df = pd.DataFrame({"audio": [audio_b64]})

result = llm_classify(
    data=df,
    model=model,
    template=EMOTION_PROMPT_TEMPLATE,
    rails=[
        "anger", "happiness", "excitement", "sadness", "neutral",
        "frustration", "fear", "surprise", "disgust", "other",
    ],
    provide_explanation=True,
)
# result["label"] might be "frustration"
# result["explanation"] describes tone, pitch, pace, volume, intensity analysis

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment