Implementation:Elevenlabs Elevenlabs python Alignment

Attribute	Value
Sources	`src/elevenlabs/types/alignment.py`
Domains	Audio Alignment, Text-to-Speech, Timing
Last Updated	2026-02-15

Overview

Description

The Alignment model provides alignment information that maps generated audio to its input text sequence in the ElevenLabs SDK. It contains three parallel lists: character start times, character durations, and the characters themselves. Together, these arrays enable precise character-level timing synchronization between text and audio output. This is particularly useful for applications requiring lip-sync, subtitle generation, or karaoke-style text highlighting.

Note: The timing fields use camelCase JSON aliases (charStartTimesMs, charDurationsMs) mapped to snake_case Python attributes via FieldMetadata(alias=...).

Usage

The Alignment model is returned as part of text-to-speech responses when alignment information is requested. The timing values are relative to the returned audio chunk from the model, not the full audio response. All three lists (char_start_times_ms, char_durations_ms, chars) share the same length and are positionally correlated.

Code Reference

Source Location

src/elevenlabs/types/alignment.py

Class Signature

class Alignment(UncheckedBaseModel):
    """
    Alignment information for the generated audio given the input text sequence.
    """
    ...

Import Statement

from elevenlabs.types import Alignment

Base Class

UncheckedBaseModel (from elevenlabs.core.unchecked_base_model)

I/O Contract

Field	Type	Required	JSON Alias	Description
`char_start_times_ms`	`Optional[List[int]]`	No	`charStartTimesMs`	A list of starting times (in milliseconds) for each character in the text as it corresponds to the audio. Times are relative to the returned chunk.
`char_durations_ms`	`Optional[List[int]]`	No	`charDurationsMs`	A list of durations (in milliseconds) for each character in the text as it corresponds to the audio. Times are relative to the returned chunk.
`chars`	`Optional[List[str]]`	No	(none)	A list of characters in the text sequence. May contain spaces, punctuation, and special characters. Length matches `charStartTimesMs` and `charDurationsMs`.

Usage Examples

from elevenlabs import ElevenLabs

client = ElevenLabs(api_key="your_api_key")

# Generate speech with alignment data
response = client.text_to_speech.convert(
    voice_id="voice_abc123",
    text="Hello world",
    output_format="mp3_44100_128",
    with_timestamps=True
)

# Access alignment information from the response
if hasattr(response, 'alignment') and response.alignment:
    alignment = response.alignment

    # Iterate through character-level timing
    if alignment.chars and alignment.char_start_times_ms and alignment.char_durations_ms:
        for char, start_ms, duration_ms in zip(
            alignment.chars,
            alignment.char_start_times_ms,
            alignment.char_durations_ms
        ):
            end_ms = start_ms + duration_ms
            print(f"'{char}': {start_ms}ms - {end_ms}ms")

Related Pages

ForcedAlignmentCharacterResponseModel - Character-level forced alignment timing
ForcedAlignmentWordResponseModel - Word-level forced alignment timing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment