Implementation:Elevenlabs Elevenlabs python TextToSpeechClient Convert

Knowledge Sources	ElevenLabs Python ElevenLabs TTS API
Domains	Speech_Synthesis, NLP
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete tool for converting text to speech audio provided by the elevenlabs-python SDK.

Description

The TextToSpeechClient.convert method sends text to the ElevenLabs TTS API and returns streaming audio bytes. It wraps the POST /v1/text-to-speech/{voice_id} endpoint, handling authentication, request serialization, and response streaming via the Fern-generated HTTP client. The response is yielded as an iterator of bytes, allowing progressive playback or saving.

Usage

Use this method to generate speech audio from text. This is the standard (non-WebSocket) TTS method suitable for batch generation where you have the complete text upfront. For streaming text input (e.g., from an LLM), use convert_realtime instead.

Code Reference

Source Location

Repository: elevenlabs-python
File: src/elevenlabs/text_to_speech/client.py
Lines: L48-178

Signature

def convert(
    self,
    voice_id: str,
    *,
    text: str,
    enable_logging: typing.Optional[bool] = None,
    optimize_streaming_latency: typing.Optional[int] = None,
    output_format: typing.Optional[TextToSpeechConvertRequestOutputFormat] = None,
    model_id: typing.Optional[str] = OMIT,
    language_code: typing.Optional[str] = OMIT,
    voice_settings: typing.Optional[VoiceSettings] = OMIT,
    pronunciation_dictionary_locators: typing.Optional[
        typing.Sequence[PronunciationDictionaryVersionLocator]
    ] = OMIT,
    seed: typing.Optional[int] = OMIT,
    previous_text: typing.Optional[str] = OMIT,
    next_text: typing.Optional[str] = OMIT,
    previous_request_ids: typing.Optional[typing.Sequence[str]] = OMIT,
    next_request_ids: typing.Optional[typing.Sequence[str]] = OMIT,
    use_pvc_as_ivc: typing.Optional[bool] = OMIT,
    apply_text_normalization: typing.Optional[BodyTextToSpeechFullApplyTextNormalization] = OMIT,
    apply_language_text_normalization: typing.Optional[bool] = OMIT,
    request_options: typing.Optional[RequestOptions] = None,
) -> typing.Iterator[bytes]:
    """Converts text into speech using a voice of your choice and returns audio."""

Import

from elevenlabs import ElevenLabs

client = ElevenLabs()
# Access via: client.text_to_speech.convert(...)

I/O Contract

Inputs

Name	Type	Required	Description
voice_id	str	Yes	Voice ID to use for synthesis
text	str	Yes	Text to convert to speech
model_id	Optional[str]	No	TTS model identifier (e.g., "eleven_multilingual_v2", "eleven_turbo_v2_5")
output_format	Optional[str]	No	Audio format: mp3_44100_128, mp3_22050_32, pcm_16000, pcm_44100, ulaw_8000, etc.
language_code	Optional[str]	No	ISO language code to enforce language
voice_settings	Optional[VoiceSettings]	No	Override stability, similarity_boost, style, use_speaker_boost
optimize_streaming_latency	Optional[int]	No	Latency optimization level (0-4)
seed	Optional[int]	No	Deterministic seed (0-4294967295)
previous_text	Optional[str]	No	Context text for continuity stitching
next_text	Optional[str]	No	Look-ahead text for continuity stitching
previous_request_ids	Optional[Sequence[str]]	No	Previous generation IDs for stitching (max 3)
next_request_ids	Optional[Sequence[str]]	No	Next generation IDs for stitching (max 3)
pronunciation_dictionary_locators	Optional[Sequence[PronunciationDictionaryVersionLocator]]	No	Pronunciation overrides (max 3)
apply_text_normalization	Optional[str]	No	'auto', 'on', or 'off'

Outputs

Name	Type	Description
(return)	Iterator[bytes]	Streaming audio byte chunks in the requested output format

Usage Examples

Basic Text to Speech

from elevenlabs import ElevenLabs, play

client = ElevenLabs()

audio = client.text_to_speech.convert(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text="The first move is what sets everything in motion.",
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128",
)

play(audio)

With Voice Settings Override

from elevenlabs import ElevenLabs, VoiceSettings, save

client = ElevenLabs()

audio = client.text_to_speech.convert(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text="Hello, welcome to the presentation.",
    model_id="eleven_multilingual_v2",
    voice_settings=VoiceSettings(
        stability=0.7,
        similarity_boost=0.8,
        style=0.3,
        use_speaker_boost=True,
    ),
)

save(audio, "output.mp3")

Continuity Stitching

from elevenlabs import ElevenLabs

client = ElevenLabs()

# Generate first segment
audio1 = client.text_to_speech.convert(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text="This is the first part of the story.",
    model_id="eleven_multilingual_v2",
    next_text="And this is what happens next.",
)

# Generate second segment with continuity
audio2 = client.text_to_speech.convert(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text="And this is what happens next.",
    model_id="eleven_multilingual_v2",
    previous_text="This is the first part of the story.",
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment