Principle:Openai Whisper English Text Normalization

Knowledge Sources	Robust Speech Recognition via Large Scale Weak Supervision UK US Spelling List
Domains	NLP, Text_Normalization, Evaluation
Last Updated	2026-02-13 22:00 GMT

Overview

Text normalization technique that canonicalizes English transcripts by expanding contractions, standardizing number formats, and unifying British/American spelling variants to enable fair Word Error Rate evaluation.

Description

English Text Normalization addresses the problem that raw speech transcripts contain many surface-level variations that are semantically equivalent but textually different. Without normalization, Word Error Rate (WER) metrics would penalize a model for producing "color" when the reference says "colour", or "21" when the reference says "twenty one".

The normalization pipeline applies a series of rule-based transformations:

Lowercasing: Converts all text to lowercase for case-insensitive comparison.
Bracket/Parenthesis Removal: Strips non-verbal annotations like stage directions.
Filler Word Removal: Removes speech disfluencies (hmm, uh, um, mm).
Contraction Expansion: Expands contractions to their full forms (won't → will not, can't → can not).
Title Normalization: Expands abbreviations (mr → mister, dr → doctor).
Number Standardization: Converts spelled-out numbers to Arabic numerals using a finite-state parser that handles ordinals, currency, decimals, and compound numbers.
Spelling Normalization: Maps British English spellings to American English equivalents using a lookup dictionary.
Symbol Cleanup: Removes stray currency symbols, percentage signs, and extra whitespace.

Usage

Use this principle when computing WER or other text-similarity metrics for English speech recognition evaluation. Apply the same normalizer to both the reference transcript and the model hypothesis to ensure that only genuine recognition errors are counted.

Theoretical Basis

The normalization follows a deterministic rule-based approach rather than a learned model. The key insight is that ASR evaluation should measure semantic accuracy, not stylistic choices.

Number Parsing Algorithm:

The number normalizer uses a streaming finite-state parser that processes words left-to-right, accumulating numeric values:

# Abstract algorithm (not actual implementation)
for word in words:
    if word is a digit name (one, two, ...):
        accumulate into current value
    elif word is a multiplier (hundred, thousand, ...):
        multiply accumulated value by multiplier
    elif word is a prefix (minus, dollar, ...):
        set prefix for next number
    elif word is a suffix (percent, ...):
        append suffix symbol to current number
    else:
        yield accumulated number, start new accumulation

Spelling Normalization:

Uses a simple dictionary lookup with approximately 1741 British-to-American mappings derived from systematic spelling differences (-ise/-ize, -our/-or, -re/-er, etc.).

Related Pages

Implementation:Openai_Whisper_EnglishTextNormalizer

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment