Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Elevenlabs Elevenlabs python SpeechToTextWordResponseModel

From Leeroopedia
Field Value
source Elevenlabs_Elevenlabs_python
domains Speech-to-Text, Transcription, Timestamps
last_updated 2026-02-15

Overview

Description

SpeechToTextWordResponseModel is a Pydantic model representing word-level detail of a transcription with timing information. Each instance corresponds to a single word or audio event (such as laughter or footsteps) that was transcribed, including its start and end timestamps, type classification, speaker identification, confidence score (logprob), and optional character-level breakdown. This model is auto-generated by Fern from the ElevenLabs API definition and extends UncheckedBaseModel.

Usage

This model is returned as part of speech-to-text transcription responses from the ElevenLabs API. It provides granular word-level timing and confidence data, which is useful for applications requiring precise transcript alignment, speaker diarization, or confidence filtering.

Code Reference

Source Location

src/elevenlabs/types/speech_to_text_word_response_model.py

Class Signature

class SpeechToTextWordResponseModel(UncheckedBaseModel):
    """
    Word-level detail of the transcription with timing information.
    """
    ...

Import Statement

from elevenlabs.types import SpeechToTextWordResponseModel

I/O Contract

Field Type Required Description
text str Yes The word or sound that was transcribed.
start Optional[float] No The start time of the word or sound in seconds.
end Optional[float] No The end time of the word or sound in seconds.
type SpeechToTextWordResponseModelType Yes The type of the word or sound. 'audio_event' is used for non-word sounds like laughter or footsteps.
speaker_id Optional[str] No Unique identifier for the speaker of this word.
logprob float Yes The log of the probability with which this word was predicted. Logprobs are in range [-infinity, 0]; higher logprobs indicate higher confidence.
characters Optional[List[SpeechToTextCharacterResponseModel]] No The characters that make up the word and their timing information.

Usage Examples

from elevenlabs.types import SpeechToTextWordResponseModel

# Typically received as part of a transcription response
word = SpeechToTextWordResponseModel(
    text="hello",
    start=0.5,
    end=0.9,
    type="word",
    speaker_id="speaker_1",
    logprob=-0.12,
)

# Check confidence level
import math
confidence = math.exp(word.logprob)
print(f"Word: '{word.text}', Confidence: {confidence:.2%}")

# Access timing information
if word.start is not None and word.end is not None:
    duration = word.end - word.start
    print(f"Duration: {duration:.3f}s")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment