Principle:Openai Whisper Punctuation Merging

Overview

Punctuation Merging is a post-processing technique that merges standalone punctuation tokens with their adjacent words to produce cleaner word-level timestamps. After word boundary detection, punctuation marks may appear as separate "words" with their own timestamps. For a better user experience, leading punctuation should be merged with the following word, and trailing punctuation should be merged with the preceding word.

Domain

Natural Language Processing
Text Processing

The Problem

After subword-to-word grouping, punctuation marks often end up as isolated single-character "words":

Before merging:
  [0.00-0.50] "       (opening quote)
  [0.50-1.20] Hello
  [1.20-1.30] ,       (comma)
  [1.30-1.80] world
  [1.80-1.90] .       (period)
  [1.90-2.00] "       (closing quote)

These standalone punctuation entries are undesirable because:

They clutter word-level output with entries that carry no spoken content.
Their timestamps are often unreliable since punctuation has no acoustic realization.
Subtitle and display systems expect punctuation to be attached to words.

Merging Strategy

The merging follows two rules based on punctuation type:

Prepended (Leading) Punctuation

Punctuation that logically precedes a word should be merged forward with the following word. Common examples:

Character	Name	Example
"	Double quote (opening)	"Hello
'	Single quote (opening)	'world
«	Left guillemet	«bonjour
¿	Inverted question mark	¿Como
(	Left parenthesis	(note)
[	Left bracket	[ref]
{	Left brace	{text}
-	Hyphen/dash	-interrupted

Appended (Trailing) Punctuation

Punctuation that logically follows a word should be merged backward with the preceding word. Common examples:

Character	Name	Example
"	Double quote (closing)	world"
'	Single quote (closing)	world'
.	Period	world.
,	Comma	world,
!	Exclamation mark	world!
?	Question mark	world?
:	Colon	word:
)	Right parenthesis	(note)
]	Right bracket	[ref]
}	Right brace	{text}

Two-Pass Algorithm

The merging is performed in two passes:

Reverse pass (leading punctuation): Iterate through the word list in reverse. If a word consists entirely of prepended punctuation characters, merge it into the following word by concatenating the text and token lists, and mark the punctuation entry as empty.
Forward pass (trailing punctuation): Iterate through the word list forward. If a word consists entirely of appended punctuation characters, merge it into the preceding word by concatenating the text and token lists, and mark the punctuation entry as empty.

After both passes, empty entries are filtered out.

Result

After merging:
  [0.00-1.20] "Hello,
  [1.30-2.00] world."

The merged words inherit the combined token lists and the timing boundaries of their constituent parts.

Duration Anomaly Handling

In addition to punctuation merging, the word-level timestamp pipeline applies duration heuristics to handle anomalies at sentence boundaries. Words with durations exceeding a threshold (typically 2x the median word duration) are flagged and their end times are adjusted to prevent unreasonably long word durations that can occur at segment boundaries.

Implementation

Implementation:Openai_Whisper_Merge_Punctuations Heuristic:Openai_Whisper_Median_Word_Duration_Clamping

Metadata

2025-06-25 00:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment