Implementation:Elevenlabs Elevenlabs python RealtimeTextToSpeechClient Convert Realtime
| Knowledge Sources | |
|---|---|
| Domains | Speech_Synthesis, Streaming, WebSocket |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for streaming text-to-speech synthesis over WebSocket provided by the elevenlabs-python SDK.
Description
The RealtimeTextToSpeechClient.convert_realtime method opens a synchronous WebSocket connection to the ElevenLabs streaming TTS endpoint and processes text chunks in real time. It extends TextToSpeechClient to add WebSocket capabilities. Internally, it uses the text_chunker utility to buffer text at sentence boundaries before sending, and performs non-blocking receives (10ms timeout) between sends to yield audio as early as possible.
The method is a Python generator that yields base64-decoded audio byte chunks. After all text is sent, it sends an empty-text flush signal and performs a blocking drain to collect remaining audio until the server closes the connection (code 1000).
Usage
Use this method when you have a streaming text source (Iterator[str]) such as output from an LLM, and need real-time audio generation. The method returns an Iterator[bytes] that can be passed directly to play(), save(), or stream() utilities.
Code Reference
Source Location
- Repository: elevenlabs-python
- File: src/elevenlabs/realtime_tts.py
- Lines: L42-145
Signature
class RealtimeTextToSpeechClient(TextToSpeechClient):
def convert_realtime(
self,
voice_id: str,
*,
text: typing.Iterator[str],
model_id: typing.Optional[str] = OMIT,
output_format: typing.Optional[OutputFormat] = "mp3_44100_128",
voice_settings: typing.Optional[VoiceSettings] = OMIT,
request_options: typing.Optional[RequestOptions] = None,
) -> typing.Iterator[bytes]:
"""
Converts streaming text into speech using a voice and returns
audio chunks via WebSocket.
Args:
voice_id: Voice ID to use for synthesis.
text: Iterator of text strings to synthesize.
model_id: TTS model identifier.
output_format: Audio format (default "mp3_44100_128").
voice_settings: Override stability/similarity/style settings.
request_options: Additional request headers.
"""
Import
from elevenlabs import ElevenLabs
client = ElevenLabs()
# Access via: client.text_to_speech.convert_realtime(...)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| voice_id | str | Yes | Voice ID to use for synthesis |
| text | Iterator[str] | Yes | Streaming text input (generator or iterator) |
| model_id | Optional[str] | No | TTS model identifier |
| output_format | Optional[OutputFormat] | No | Audio encoding format (default "mp3_44100_128") |
| voice_settings | Optional[VoiceSettings] | No | Override stability, similarity_boost, style, use_speaker_boost |
Outputs
| Name | Type | Description |
|---|---|---|
| (return) | Iterator[bytes] | Streaming base64-decoded audio byte chunks |
Usage Examples
Basic Realtime TTS
from elevenlabs import ElevenLabs, stream
client = ElevenLabs()
def text_stream():
yield "Hello, "
yield "how are you "
yield "doing today?"
audio = client.text_to_speech.convert_realtime(
voice_id="JBFqnCBsd6RMkjVDRZzb",
text=text_stream(),
model_id="eleven_multilingual_v2",
)
# Stream plays audio progressively as it generates
stream(audio)
With LLM Output Stream
from elevenlabs import ElevenLabs, VoiceSettings, stream
client = ElevenLabs()
def get_llm_response():
"""Simulate streaming LLM output."""
# In practice, yield chunks from OpenAI/Anthropic streaming API
for word in "The quick brown fox jumps over the lazy dog".split():
yield word + " "
audio = client.text_to_speech.convert_realtime(
voice_id="JBFqnCBsd6RMkjVDRZzb",
text=get_llm_response(),
model_id="eleven_turbo_v2_5",
voice_settings=VoiceSettings(
stability=0.5,
similarity_boost=0.75,
style=0.0,
use_speaker_boost=True,
),
)
full_audio = stream(audio)