Implementation:Elevenlabs Elevenlabs python Text Chunker
| Knowledge Sources | |
|---|---|
| Domains | NLP, Streaming, Text_Processing |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for buffering and splitting streaming text at sentence boundaries provided by the elevenlabs-python SDK.
Description
The text_chunker function is a Python generator that takes an iterator of text fragments and yields sentence-boundary-aligned chunks. It uses a buffer to accumulate incoming text and emits chunks when it detects any of 15 splitter characters. Each emitted chunk is guaranteed to end with a space character for clean concatenation.
The function is used internally by convert_realtime to preprocess text before sending over WebSocket, but is also exported for standalone use in custom streaming pipelines.
Usage
Use this function when building a custom streaming TTS pipeline where you need to preprocess text fragments into sentence-aligned chunks before sending to the WebSocket API.
Code Reference
Source Location
- Repository: elevenlabs-python
- File: src/elevenlabs/realtime_tts.py
- Lines: L24-39
Signature
def text_chunker(chunks: typing.Iterator[str]) -> typing.Iterator[str]:
"""Used during input streaming to chunk text blocks and set last char to space.
Splits text at sentence boundaries defined by:
(".", ",", "?", "!", ";", ":", "—", "-", "(", ")", "[", "]", "}", " ")
Args:
chunks: Iterator of text fragments (e.g., from LLM stream).
Yields:
str: Sentence-boundary-aligned text chunks, each ending with a space.
"""
Import
from elevenlabs.realtime_tts import text_chunker
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| chunks | Iterator[str] | Yes | Stream of text fragments (words, tokens, partial sentences) |
Outputs
| Name | Type | Description |
|---|---|---|
| (yields) | Iterator[str] | Sentence-boundary-aligned chunks, each ending with a space |
Usage Examples
Basic Usage
from elevenlabs.realtime_tts import text_chunker
def llm_tokens():
"""Simulate LLM token stream."""
tokens = ["Hello", ", ", "how ", "are ", "you", "? ", "I'm ", "fine", "."]
for token in tokens:
yield token
for chunk in text_chunker(llm_tokens()):
print(repr(chunk))
# Output:
# 'Hello, '
# 'how are you? '
# "I'm fine. "
Custom Streaming Pipeline
import json
import websockets
from elevenlabs.realtime_tts import text_chunker
def get_llm_stream():
for word in "The weather is nice today. Let's go outside.".split():
yield word + " "
# Use text_chunker as preprocessing before manual WebSocket send
for chunk in text_chunker(get_llm_stream()):
# chunk is now aligned to sentence boundaries
print(f"Sending: {repr(chunk)}")