Principle:Elevenlabs Elevenlabs python Audio Interface

Knowledge Sources	ElevenLabs Python ElevenLabs ConvAI Docs
Domains	Audio_Processing, Conversational_AI, Hardware_Abstraction
Last Updated	2026-02-15 00:00 GMT

Overview

An abstraction layer that provides bidirectional audio I/O (microphone input and speaker output) for real-time conversational AI sessions, with support for buffered output and interruption handling.

Description

Audio Interface is the hardware abstraction layer for conversational AI. It manages two audio streams simultaneously:

Input stream: Captures microphone audio as 16-bit PCM mono at 16kHz and delivers it to the conversation handler via callbacks
Output stream: Receives synthesized agent audio and plays it through speakers with buffering and interruption support

The interface follows the Abstract Base Class pattern, defining four methods that any audio implementation must provide: start, stop, output, and interrupt. This decouples the conversation logic from the specific audio hardware or library used, allowing custom implementations for different platforms (e.g., browser WebRTC, mobile audio APIs, virtual audio devices).

The default implementation uses PyAudio for cross-platform microphone and speaker access, with a dedicated output thread for non-blocking playback and a queue-based buffer for interruption support.

Usage

Use this principle when building a real-time voice conversation with an ElevenLabs agent. The audio interface must be provided to the Conversation constructor and handles all audio I/O for the session. Use the default implementation for standard microphone/speaker setups, or implement the ABC for custom audio sources and sinks.

Theoretical Basis

The Audio Interface follows the Strategy pattern with an ABC defining the contract:

# Abstract interface contract
class AudioInterface(ABC):
    def start(input_callback):     # Begin capture, call input_callback(audio_bytes)
    def stop():                     # Clean up resources
    def output(audio_bytes):        # Play agent audio (non-blocking)
    def interrupt():                # Stop current playback immediately

Key design considerations:

Non-blocking output: output() must return quickly; actual playback happens in a separate thread
Interruption support: When the user speaks over the agent, all buffered audio must be discarded immediately
PCM format: 16-bit signed integer, mono channel, 16kHz sample rate is the fixed audio format

Related Pages

Implemented By

Implementation:Elevenlabs_Elevenlabs_python_DefaultAudioInterface

Uses Heuristic

Heuristic:Elevenlabs_Elevenlabs_python_Audio_Buffer_Sizes

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment