Principle:Elevenlabs Elevenlabs python Text Chunking

Knowledge Sources	ElevenLabs Python
Domains	NLP, Streaming, Text_Processing
Last Updated	2026-02-15 00:00 GMT

Overview

A buffering algorithm that splits a stream of text fragments into sentence-boundary-aligned chunks suitable for speech synthesis, ensuring natural prosody in generated audio.

Description

Text Chunking addresses a fundamental challenge in streaming TTS: text from sources like LLMs arrives in arbitrary fragments (words, partial words, tokens) that don't align with natural speech boundaries. Sending these fragments directly to TTS would produce unnatural prosody because the synthesis model needs sentence-level context to generate proper intonation.

The chunking algorithm buffers incoming text fragments and emits chunks when a sentence boundary is detected. Sentence boundaries are identified by a set of splitter characters (periods, commas, question marks, exclamation marks, semicolons, colons, dashes, and bracket characters). Each emitted chunk ends with a space to ensure clean concatenation.

This preprocessing step is critical for maintaining audio quality in realtime TTS pipelines.

Usage

Use this principle whenever streaming text to a TTS system. The text chunker should sit between the text source (LLM, user input, etc.) and the WebSocket TTS endpoint. It is automatically applied inside convert_realtime but can also be used independently for custom streaming pipelines.

Theoretical Basis

The algorithm maintains a buffer and applies a greedy split-at-boundary strategy:

# Abstract algorithm
splitters = (".", ",", "?", "!", ";", ":", "—", "-", "(", ")", "[", "]", "}", " ")
buffer = ""

for fragment in text_stream:
    if buffer.ends_with(splitter):
        yield buffer  # Emit at boundary
        buffer = fragment
    elif fragment.starts_with(splitter):
        yield buffer + fragment[0]  # Include boundary char
        buffer = fragment[1:]
    else:
        buffer += fragment  # Continue buffering

if buffer:
    yield buffer  # Flush remaining

This ensures each yielded chunk contains a complete clause or sentence, allowing the TTS model to apply appropriate prosody.

Related Pages

Implemented By

Implementation:Elevenlabs_Elevenlabs_python_Text_Chunker

Uses Heuristic

Heuristic:Elevenlabs_Elevenlabs_python_Text_Chunking_Splitter_Characters

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment