Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Elevenlabs Elevenlabs python Text Chunking Splitter Characters

From Leeroopedia
Knowledge Sources
Domains Optimization, Realtime_TTS
Last Updated 2026-02-15 12:00 GMT

Overview

Text chunking strategy that buffers streaming text at sentence boundaries for optimal real-time TTS quality.

Description

When streaming text to the real-time TTS WebSocket, the `text_chunker` function buffers incoming text fragments and yields them at natural sentence boundaries. It detects boundaries using a predefined set of splitter characters: punctuation marks (`. , ? ! ; :`), dashes (`— -`), brackets (`( ) [ ] }`), and spaces. The chunker ensures every yielded chunk ends with a space character, appending one if necessary, to signal a word boundary to the TTS engine.

Usage

Use this heuristic when implementing real-time text-to-speech streaming with the ElevenLabs SDK. The `text_chunker` is automatically applied inside `convert_realtime()`, but understanding its behavior is important for:

  • Designing your text iterator to yield at appropriate granularity
  • Understanding why very short text fragments may be buffered before being sent
  • Debugging latency issues in streaming TTS output

The Insight (Rule of Thumb)

  • Action: Use the built-in `text_chunker` (or replicate its logic) when streaming text to the TTS WebSocket.
  • Splitter characters: `(".", ",", "?", "!", ";", ":", "—", "-", "(", ")", "[", "]", "}", " ")`
  • Trailing space: Every yielded chunk is guaranteed to end with a space, ensuring the TTS engine treats it as a complete word boundary.
  • Trade-off: Chunking at sentence boundaries adds a small buffering delay but dramatically improves TTS output quality and prosody.
  • Guideline: Yield text from your iterator at word or phrase granularity; the chunker handles the rest.

Reasoning

TTS engines produce significantly better prosody and naturalness when they receive text at sentence or clause boundaries rather than character-by-character. Sending partial words can cause mispronunciation and robotic-sounding output. The chunker balances latency (small buffer) against quality (complete phrases). The trailing space convention signals to the server that the preceding word is complete and can be synthesized.

The `chunk_length_schedule=[50]` in the initial WebSocket message further controls server-side buffering, setting the minimum character count before the server starts generating audio.

Code Evidence

Splitter characters and chunking logic from `realtime_tts.py:24-39`:

def text_chunker(chunks: typing.Iterator[str]) -> typing.Iterator[str]:
    """Used during input streaming to chunk text blocks and set last char to space"""
    splitters = (".", ",", "?", "!", ";", ":", "—", "-", "(", ")", "[", "]", "}", " ")
    buffer = ""
    for text in chunks:
        if buffer.endswith(splitters):
            yield buffer if buffer.endswith(" ") else buffer + " "
            buffer = text
        elif text.startswith(splitters):
            output = buffer + text[0]
            yield output if output.endswith(" ") else output + " "
            buffer = text[1:]
        else:
            buffer += text
    if buffer != "":
        yield buffer + " "

Server-side chunk length configuration from `realtime_tts.py:114-116`:

generation_config=dict(
    chunk_length_schedule=[50],
),

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment