Implementation:Elevenlabs Elevenlabs python TextToSpeechClient Convert
| Knowledge Sources | |
|---|---|
| Domains | Speech_Synthesis, NLP |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for converting text to speech audio provided by the elevenlabs-python SDK.
Description
The TextToSpeechClient.convert method sends text to the ElevenLabs TTS API and returns streaming audio bytes. It wraps the POST /v1/text-to-speech/{voice_id} endpoint, handling authentication, request serialization, and response streaming via the Fern-generated HTTP client. The response is yielded as an iterator of bytes, allowing progressive playback or saving.
Usage
Use this method to generate speech audio from text. This is the standard (non-WebSocket) TTS method suitable for batch generation where you have the complete text upfront. For streaming text input (e.g., from an LLM), use convert_realtime instead.
Code Reference
Source Location
- Repository: elevenlabs-python
- File: src/elevenlabs/text_to_speech/client.py
- Lines: L48-178
Signature
def convert(
self,
voice_id: str,
*,
text: str,
enable_logging: typing.Optional[bool] = None,
optimize_streaming_latency: typing.Optional[int] = None,
output_format: typing.Optional[TextToSpeechConvertRequestOutputFormat] = None,
model_id: typing.Optional[str] = OMIT,
language_code: typing.Optional[str] = OMIT,
voice_settings: typing.Optional[VoiceSettings] = OMIT,
pronunciation_dictionary_locators: typing.Optional[
typing.Sequence[PronunciationDictionaryVersionLocator]
] = OMIT,
seed: typing.Optional[int] = OMIT,
previous_text: typing.Optional[str] = OMIT,
next_text: typing.Optional[str] = OMIT,
previous_request_ids: typing.Optional[typing.Sequence[str]] = OMIT,
next_request_ids: typing.Optional[typing.Sequence[str]] = OMIT,
use_pvc_as_ivc: typing.Optional[bool] = OMIT,
apply_text_normalization: typing.Optional[BodyTextToSpeechFullApplyTextNormalization] = OMIT,
apply_language_text_normalization: typing.Optional[bool] = OMIT,
request_options: typing.Optional[RequestOptions] = None,
) -> typing.Iterator[bytes]:
"""Converts text into speech using a voice of your choice and returns audio."""
Import
from elevenlabs import ElevenLabs
client = ElevenLabs()
# Access via: client.text_to_speech.convert(...)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| voice_id | str | Yes | Voice ID to use for synthesis |
| text | str | Yes | Text to convert to speech |
| model_id | Optional[str] | No | TTS model identifier (e.g., "eleven_multilingual_v2", "eleven_turbo_v2_5") |
| output_format | Optional[str] | No | Audio format: mp3_44100_128, mp3_22050_32, pcm_16000, pcm_44100, ulaw_8000, etc. |
| language_code | Optional[str] | No | ISO language code to enforce language |
| voice_settings | Optional[VoiceSettings] | No | Override stability, similarity_boost, style, use_speaker_boost |
| optimize_streaming_latency | Optional[int] | No | Latency optimization level (0-4) |
| seed | Optional[int] | No | Deterministic seed (0-4294967295) |
| previous_text | Optional[str] | No | Context text for continuity stitching |
| next_text | Optional[str] | No | Look-ahead text for continuity stitching |
| previous_request_ids | Optional[Sequence[str]] | No | Previous generation IDs for stitching (max 3) |
| next_request_ids | Optional[Sequence[str]] | No | Next generation IDs for stitching (max 3) |
| pronunciation_dictionary_locators | Optional[Sequence[PronunciationDictionaryVersionLocator]] | No | Pronunciation overrides (max 3) |
| apply_text_normalization | Optional[str] | No | 'auto', 'on', or 'off' |
Outputs
| Name | Type | Description |
|---|---|---|
| (return) | Iterator[bytes] | Streaming audio byte chunks in the requested output format |
Usage Examples
Basic Text to Speech
from elevenlabs import ElevenLabs, play
client = ElevenLabs()
audio = client.text_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
text="The first move is what sets everything in motion.",
model_id="eleven_multilingual_v2",
output_format="mp3_44100_128",
)
play(audio)
With Voice Settings Override
from elevenlabs import ElevenLabs, VoiceSettings, save
client = ElevenLabs()
audio = client.text_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
text="Hello, welcome to the presentation.",
model_id="eleven_multilingual_v2",
voice_settings=VoiceSettings(
stability=0.7,
similarity_boost=0.8,
style=0.3,
use_speaker_boost=True,
),
)
save(audio, "output.mp3")
Continuity Stitching
from elevenlabs import ElevenLabs
client = ElevenLabs()
# Generate first segment
audio1 = client.text_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
text="This is the first part of the story.",
model_id="eleven_multilingual_v2",
next_text="And this is what happens next.",
)
# Generate second segment with continuity
audio2 = client.text_to_speech.convert(
voice_id="JBFqnCBsd6RMkjVDRZzb",
text="And this is what happens next.",
model_id="eleven_multilingual_v2",
previous_text="This is the first part of the story.",
)