Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Elevenlabs Elevenlabs python Text Chunking

From Leeroopedia
Knowledge Sources
Domains NLP, Streaming, Text_Processing
Last Updated 2026-02-15 00:00 GMT

Overview

A buffering algorithm that splits a stream of text fragments into sentence-boundary-aligned chunks suitable for speech synthesis, ensuring natural prosody in generated audio.

Description

Text Chunking addresses a fundamental challenge in streaming TTS: text from sources like LLMs arrives in arbitrary fragments (words, partial words, tokens) that don't align with natural speech boundaries. Sending these fragments directly to TTS would produce unnatural prosody because the synthesis model needs sentence-level context to generate proper intonation.

The chunking algorithm buffers incoming text fragments and emits chunks when a sentence boundary is detected. Sentence boundaries are identified by a set of splitter characters (periods, commas, question marks, exclamation marks, semicolons, colons, dashes, and bracket characters). Each emitted chunk ends with a space to ensure clean concatenation.

This preprocessing step is critical for maintaining audio quality in realtime TTS pipelines.

Usage

Use this principle whenever streaming text to a TTS system. The text chunker should sit between the text source (LLM, user input, etc.) and the WebSocket TTS endpoint. It is automatically applied inside convert_realtime but can also be used independently for custom streaming pipelines.

Theoretical Basis

The algorithm maintains a buffer and applies a greedy split-at-boundary strategy:

# Abstract algorithm
splitters = (".", ",", "?", "!", ";", ":", "—", "-", "(", ")", "[", "]", "}", " ")
buffer = ""

for fragment in text_stream:
    if buffer.ends_with(splitter):
        yield buffer  # Emit at boundary
        buffer = fragment
    elif fragment.starts_with(splitter):
        yield buffer + fragment[0]  # Include boundary char
        buffer = fragment[1:]
    else:
        buffer += fragment  # Continue buffering

if buffer:
    yield buffer  # Flush remaining

This ensures each yielded chunk contains a complete clause or sentence, allowing the TTS model to apply appropriate prosody.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment