Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Elevenlabs Elevenlabs python TextToSpeechClient Convert

From Leeroopedia
Knowledge Sources
Domains Speech_Synthesis, NLP
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete tool for converting text to speech audio provided by the elevenlabs-python SDK.

Description

The TextToSpeechClient.convert method sends text to the ElevenLabs TTS API and returns streaming audio bytes. It wraps the POST /v1/text-to-speech/{voice_id} endpoint, handling authentication, request serialization, and response streaming via the Fern-generated HTTP client. The response is yielded as an iterator of bytes, allowing progressive playback or saving.

Usage

Use this method to generate speech audio from text. This is the standard (non-WebSocket) TTS method suitable for batch generation where you have the complete text upfront. For streaming text input (e.g., from an LLM), use convert_realtime instead.

Code Reference

Source Location

  • Repository: elevenlabs-python
  • File: src/elevenlabs/text_to_speech/client.py
  • Lines: L48-178

Signature

def convert(
    self,
    voice_id: str,
    *,
    text: str,
    enable_logging: typing.Optional[bool] = None,
    optimize_streaming_latency: typing.Optional[int] = None,
    output_format: typing.Optional[TextToSpeechConvertRequestOutputFormat] = None,
    model_id: typing.Optional[str] = OMIT,
    language_code: typing.Optional[str] = OMIT,
    voice_settings: typing.Optional[VoiceSettings] = OMIT,
    pronunciation_dictionary_locators: typing.Optional[
        typing.Sequence[PronunciationDictionaryVersionLocator]
    ] = OMIT,
    seed: typing.Optional[int] = OMIT,
    previous_text: typing.Optional[str] = OMIT,
    next_text: typing.Optional[str] = OMIT,
    previous_request_ids: typing.Optional[typing.Sequence[str]] = OMIT,
    next_request_ids: typing.Optional[typing.Sequence[str]] = OMIT,
    use_pvc_as_ivc: typing.Optional[bool] = OMIT,
    apply_text_normalization: typing.Optional[BodyTextToSpeechFullApplyTextNormalization] = OMIT,
    apply_language_text_normalization: typing.Optional[bool] = OMIT,
    request_options: typing.Optional[RequestOptions] = None,
) -> typing.Iterator[bytes]:
    """Converts text into speech using a voice of your choice and returns audio."""

Import

from elevenlabs import ElevenLabs

client = ElevenLabs()
# Access via: client.text_to_speech.convert(...)

I/O Contract

Inputs

Name Type Required Description
voice_id str Yes Voice ID to use for synthesis
text str Yes Text to convert to speech
model_id Optional[str] No TTS model identifier (e.g., "eleven_multilingual_v2", "eleven_turbo_v2_5")
output_format Optional[str] No Audio format: mp3_44100_128, mp3_22050_32, pcm_16000, pcm_44100, ulaw_8000, etc.
language_code Optional[str] No ISO language code to enforce language
voice_settings Optional[VoiceSettings] No Override stability, similarity_boost, style, use_speaker_boost
optimize_streaming_latency Optional[int] No Latency optimization level (0-4)
seed Optional[int] No Deterministic seed (0-4294967295)
previous_text Optional[str] No Context text for continuity stitching
next_text Optional[str] No Look-ahead text for continuity stitching
previous_request_ids Optional[Sequence[str]] No Previous generation IDs for stitching (max 3)
next_request_ids Optional[Sequence[str]] No Next generation IDs for stitching (max 3)
pronunciation_dictionary_locators Optional[Sequence[PronunciationDictionaryVersionLocator]] No Pronunciation overrides (max 3)
apply_text_normalization Optional[str] No 'auto', 'on', or 'off'

Outputs

Name Type Description
(return) Iterator[bytes] Streaming audio byte chunks in the requested output format

Usage Examples

Basic Text to Speech

from elevenlabs import ElevenLabs, play

client = ElevenLabs()

audio = client.text_to_speech.convert(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text="The first move is what sets everything in motion.",
    model_id="eleven_multilingual_v2",
    output_format="mp3_44100_128",
)

play(audio)

With Voice Settings Override

from elevenlabs import ElevenLabs, VoiceSettings, save

client = ElevenLabs()

audio = client.text_to_speech.convert(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text="Hello, welcome to the presentation.",
    model_id="eleven_multilingual_v2",
    voice_settings=VoiceSettings(
        stability=0.7,
        similarity_boost=0.8,
        style=0.3,
        use_speaker_boost=True,
    ),
)

save(audio, "output.mp3")

Continuity Stitching

from elevenlabs import ElevenLabs

client = ElevenLabs()

# Generate first segment
audio1 = client.text_to_speech.convert(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text="This is the first part of the story.",
    model_id="eleven_multilingual_v2",
    next_text="And this is what happens next.",
)

# Generate second segment with continuity
audio2 = client.text_to_speech.convert(
    voice_id="JBFqnCBsd6RMkjVDRZzb",
    text="And this is what happens next.",
    model_id="eleven_multilingual_v2",
    previous_text="This is the first part of the story.",
)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment