Implementation:Openai Whisper SubtitlesWriter Iterate Result
Overview
SubtitlesWriter.iterate_result() is a method that generates subtitle entries from a transcription result, supporting word-level timing, configurable line breaking, and karaoke-style word highlighting. It is the core iteration engine used by both WriteVTT (WebVTT format) and WriteSRT (SRT format) subtitle writers.
Source
- File:
whisper/utils.py, lines 119-228 - Repository: https://github.com/openai/whisper
- Import:
from whisper.utils import WriteVTT, WriteSRT(SubtitlesWriter is the base class)
Signature
def iterate_result(
self,
result: dict,
options: Optional[dict] = None,
*,
max_line_width: Optional[int] = None,
max_line_count: Optional[int] = None,
highlight_words: bool = False,
max_words_per_line: Optional[int] = None,
) -> Iterator[Tuple[str, str, str]]:
Parameters
| Parameter | Type | Description |
|---|---|---|
| result | dict | Transcription result containing "segments" with optional "words" arrays
|
| options | Optional[dict] | Dictionary that can contain max_line_width, max_line_count, highlight_words, and max_words_per_line
|
| max_line_width | Optional[int] | Maximum characters per subtitle line |
| max_line_count | Optional[int] | Maximum number of lines per subtitle block |
| highlight_words | bool | Enable word-level underline highlighting using <u> tags (default False)
|
| max_words_per_line | Optional[int] | Maximum words per subtitle line |
Return Value
Yields Tuple[str, str, str] containing:
- start_timestamp: Formatted start time string
- end_timestamp: Formatted end time string
- subtitle_text: The subtitle text content (may include HTML tags when highlighting)
Behavior
Word-Level Mode
When segments contain "words" arrays (from word_timestamps=True):
- Word iteration: Iterates word-by-word through all segments.
- Line width management: Tracks accumulated line width. When adding a word would exceed
max_line_width, starts a new line. - Line count management: When the current block exceeds
max_line_countlines, yields the current block and starts a new one. - Words per line: If
max_words_per_lineis set, forces a new line after the specified number of words. - Pause detection: If the gap between consecutive words exceeds 3 seconds, yields the current block and starts a new one regardless of other constraints.
- Segment boundaries: Respects segment boundaries, yielding the current block when a segment ends.
Word Highlighting Mode
When highlight_words=True:
- For each word in a subtitle block, generates a separate cue.
- In each cue, the currently-spoken word is wrapped in
<u>...</u>tags. - All other words appear without formatting.
- The cue timing spans from the highlighted word's start time to the block's end time.
Segment-Level Fallback
When segments do not contain word-level data:
- Falls back to segment-level timing.
- Each segment becomes a single subtitle entry with the segment's start and end times.
- Line breaking options have no effect in this mode.
Writer Subclasses
| Class | Format | File Extension | Time Format |
|---|---|---|---|
| WriteVTT | WebVTT | .vtt | HH:MM:SS.mmm |
| WriteSRT | SubRip | .srt | HH:MM:SS,mmm |
Both subclasses use iterate_result() for content generation and add their format-specific headers and cue numbering.
Example Usage
from whisper.utils import get_writer
# VTT with word highlighting
writer = get_writer("vtt", "./output")
writer(result, audio_path, options={"highlight_words": True, "max_line_width": 42})
# Generates VTT like:
# WEBVTT
#
# 00:00.000 --> 00:02.400
# <u>Hello</u> world how are you
#
# 00:00.500 --> 00:02.400
# Hello <u>world</u> how are you
SRT Output Example
writer = get_writer("srt", "./output")
writer(result, audio_path, options={"max_line_width": 42, "max_line_count": 2})
# Generates SRT like:
# 1
# 00:00:00,000 --> 00:00:02,400
# Hello world how are you
# today is a good day
#
# 2
# 00:00:02,400 --> 00:00:04,800
# and I hope you are
# doing well
Links
Principle:Openai_Whisper_Word_Level_Subtitle_Output