Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Elevenlabs Elevenlabs python Alignment

From Leeroopedia
Attribute Value
Sources src/elevenlabs/types/alignment.py
Domains Audio Alignment, Text-to-Speech, Timing
Last Updated 2026-02-15

Overview

Description

The Alignment model provides alignment information that maps generated audio to its input text sequence in the ElevenLabs SDK. It contains three parallel lists: character start times, character durations, and the characters themselves. Together, these arrays enable precise character-level timing synchronization between text and audio output. This is particularly useful for applications requiring lip-sync, subtitle generation, or karaoke-style text highlighting.

Note: The timing fields use camelCase JSON aliases (charStartTimesMs, charDurationsMs) mapped to snake_case Python attributes via FieldMetadata(alias=...).

Usage

The Alignment model is returned as part of text-to-speech responses when alignment information is requested. The timing values are relative to the returned audio chunk from the model, not the full audio response. All three lists (char_start_times_ms, char_durations_ms, chars) share the same length and are positionally correlated.

Code Reference

Source Location

src/elevenlabs/types/alignment.py

Class Signature

class Alignment(UncheckedBaseModel):
    """
    Alignment information for the generated audio given the input text sequence.
    """
    ...

Import Statement

from elevenlabs.types import Alignment

Base Class

UncheckedBaseModel (from elevenlabs.core.unchecked_base_model)

I/O Contract

Field Type Required JSON Alias Description
char_start_times_ms Optional[List[int]] No charStartTimesMs A list of starting times (in milliseconds) for each character in the text as it corresponds to the audio. Times are relative to the returned chunk.
char_durations_ms Optional[List[int]] No charDurationsMs A list of durations (in milliseconds) for each character in the text as it corresponds to the audio. Times are relative to the returned chunk.
chars Optional[List[str]] No (none) A list of characters in the text sequence. May contain spaces, punctuation, and special characters. Length matches charStartTimesMs and charDurationsMs.

Usage Examples

from elevenlabs import ElevenLabs

client = ElevenLabs(api_key="your_api_key")

# Generate speech with alignment data
response = client.text_to_speech.convert(
    voice_id="voice_abc123",
    text="Hello world",
    output_format="mp3_44100_128",
    with_timestamps=True
)

# Access alignment information from the response
if hasattr(response, 'alignment') and response.alignment:
    alignment = response.alignment

    # Iterate through character-level timing
    if alignment.chars and alignment.char_start_times_ms and alignment.char_durations_ms:
        for char, start_ms, duration_ms in zip(
            alignment.chars,
            alignment.char_start_times_ms,
            alignment.char_durations_ms
        ):
            end_ms = start_ms + duration_ms
            print(f"'{char}': {start_ms}ms - {end_ms}ms")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment