Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Whisper SubtitlesWriter Iterate Result

From Leeroopedia
Revision as of 13:42, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Openai_Whisper_SubtitlesWriter_Iterate_Result.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

SubtitlesWriter.iterate_result() is a method that generates subtitle entries from a transcription result, supporting word-level timing, configurable line breaking, and karaoke-style word highlighting. It is the core iteration engine used by both WriteVTT (WebVTT format) and WriteSRT (SRT format) subtitle writers.

Source

  • File: whisper/utils.py, lines 119-228
  • Repository: https://github.com/openai/whisper
  • Import: from whisper.utils import WriteVTT, WriteSRT (SubtitlesWriter is the base class)

Signature

def iterate_result(
    self,
    result: dict,
    options: Optional[dict] = None,
    *,
    max_line_width: Optional[int] = None,
    max_line_count: Optional[int] = None,
    highlight_words: bool = False,
    max_words_per_line: Optional[int] = None,
) -> Iterator[Tuple[str, str, str]]:

Parameters

Parameter Type Description
result dict Transcription result containing "segments" with optional "words" arrays
options Optional[dict] Dictionary that can contain max_line_width, max_line_count, highlight_words, and max_words_per_line
max_line_width Optional[int] Maximum characters per subtitle line
max_line_count Optional[int] Maximum number of lines per subtitle block
highlight_words bool Enable word-level underline highlighting using <u> tags (default False)
max_words_per_line Optional[int] Maximum words per subtitle line

Return Value

Yields Tuple[str, str, str] containing:

  • start_timestamp: Formatted start time string
  • end_timestamp: Formatted end time string
  • subtitle_text: The subtitle text content (may include HTML tags when highlighting)

Behavior

Word-Level Mode

When segments contain "words" arrays (from word_timestamps=True):

  1. Word iteration: Iterates word-by-word through all segments.
  2. Line width management: Tracks accumulated line width. When adding a word would exceed max_line_width, starts a new line.
  3. Line count management: When the current block exceeds max_line_count lines, yields the current block and starts a new one.
  4. Words per line: If max_words_per_line is set, forces a new line after the specified number of words.
  5. Pause detection: If the gap between consecutive words exceeds 3 seconds, yields the current block and starts a new one regardless of other constraints.
  6. Segment boundaries: Respects segment boundaries, yielding the current block when a segment ends.

Word Highlighting Mode

When highlight_words=True:

  1. For each word in a subtitle block, generates a separate cue.
  2. In each cue, the currently-spoken word is wrapped in <u>...</u> tags.
  3. All other words appear without formatting.
  4. The cue timing spans from the highlighted word's start time to the block's end time.

Segment-Level Fallback

When segments do not contain word-level data:

  1. Falls back to segment-level timing.
  2. Each segment becomes a single subtitle entry with the segment's start and end times.
  3. Line breaking options have no effect in this mode.

Writer Subclasses

Class Format File Extension Time Format
WriteVTT WebVTT .vtt HH:MM:SS.mmm
WriteSRT SubRip .srt HH:MM:SS,mmm

Both subclasses use iterate_result() for content generation and add their format-specific headers and cue numbering.

Example Usage

from whisper.utils import get_writer

# VTT with word highlighting
writer = get_writer("vtt", "./output")
writer(result, audio_path, options={"highlight_words": True, "max_line_width": 42})

# Generates VTT like:
# WEBVTT
#
# 00:00.000 --> 00:02.400
# <u>Hello</u> world how are you
#
# 00:00.500 --> 00:02.400
# Hello <u>world</u> how are you

SRT Output Example

writer = get_writer("srt", "./output")
writer(result, audio_path, options={"max_line_width": 42, "max_line_count": 2})

# Generates SRT like:
# 1
# 00:00:00,000 --> 00:00:02,400
# Hello world how are you
# today is a good day
#
# 2
# 00:00:02,400 --> 00:00:04,800
# and I hope you are
# doing well

Links

Principle:Openai_Whisper_Word_Level_Subtitle_Output

Metadata

2025-06-25 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment