Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Elevenlabs Elevenlabs python RealtimeTextToSpeechClient Convert Realtime

From Leeroopedia
Knowledge Sources
Domains Speech_Synthesis, Streaming, WebSocket
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete tool for streaming text-to-speech synthesis over WebSocket provided by the elevenlabs-python SDK.

Description

The RealtimeTextToSpeechClient.convert_realtime method opens a synchronous WebSocket connection to the ElevenLabs streaming TTS endpoint and processes text chunks in real time. It extends TextToSpeechClient to add WebSocket capabilities. Internally, it uses the text_chunker utility to buffer text at sentence boundaries before sending, and performs non-blocking receives (10ms timeout) between sends to yield audio as early as possible.

The method is a Python generator that yields base64-decoded audio byte chunks. After all text is sent, it sends an empty-text flush signal and performs a blocking drain to collect remaining audio until the server closes the connection (code 1000).

Usage

Use this method when you have a streaming text source (Iterator[str]) such as output from an LLM, and need real-time audio generation. The method returns an Iterator[bytes] that can be passed directly to play(), save(), or stream() utilities.

Code Reference

Source Location

Signature

class RealtimeTextToSpeechClient(TextToSpeechClient):
    def convert_realtime(
        self,
        voice_id: str,
        *,
        text: typing.Iterator[str],
        model_id: typing.Optional[str] = OMIT,
        output_format: typing.Optional[OutputFormat] = "mp3_44100_128",
        voice_settings: typing.Optional[VoiceSettings] = OMIT,
        request_options: typing.Optional[RequestOptions] = None,
    ) -> typing.Iterator[bytes]:
        """
        Converts streaming text into speech using a voice and returns
        audio chunks via WebSocket.

        Args:
            voice_id: Voice ID to use for synthesis.
            text: Iterator of text strings to synthesize.
            model_id: TTS model identifier.
            output_format: Audio format (default "mp3_44100_128").
            voice_settings: Override stability/similarity/style settings.
            request_options: Additional request headers.
        """

Import

from elevenlabs import ElevenLabs

client = ElevenLabs()
# Access via: client.text_to_speech.convert_realtime(...)

I/O Contract

Inputs

Name Type Required Description
voice_id str Yes Voice ID to use for synthesis
text Iterator[str] Yes Streaming text input (generator or iterator)
model_id Optional[str] No TTS model identifier
output_format Optional[OutputFormat] No Audio encoding format (default "mp3_44100_128")
voice_settings Optional[VoiceSettings] No Override stability, similarity_boost, style, use_speaker_boost

Outputs

Name Type Description
(return) Iterator[bytes] Streaming base64-decoded audio byte chunks

Usage Examples

Basic Realtime TTS

from elevenlabs import ElevenLabs, stream

client = ElevenLabs()

def text_stream():
    yield "Hello, "
    yield "how are you "
    yield "doing today?"

audio = client.text_to_speech.convert_realtime(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text=text_stream(),
    model_id="eleven_multilingual_v2",
)

# Stream plays audio progressively as it generates
stream(audio)

With LLM Output Stream

from elevenlabs import ElevenLabs, VoiceSettings, stream

client = ElevenLabs()

def get_llm_response():
    """Simulate streaming LLM output."""
    # In practice, yield chunks from OpenAI/Anthropic streaming API
    for word in "The quick brown fox jumps over the lazy dog".split():
        yield word + " "

audio = client.text_to_speech.convert_realtime(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text=get_llm_response(),
    model_id="eleven_turbo_v2_5",
    voice_settings=VoiceSettings(
        stability=0.5,
        similarity_boost=0.75,
        style=0.0,
        use_speaker_boost=True,
    ),
)

full_audio = stream(audio)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment