Implementation:Openai Whisper English Spelling Mappings
| Knowledge Sources | |
|---|---|
| Domains | NLP, Text_Normalization |
| Last Updated | 2026-02-13 22:00 GMT |
Overview
Data file containing British-to-American English spelling mappings used by EnglishSpellingNormalizer for transcript text standardization.
Description
The english.json file is a JSON dictionary with approximately 1741 entries mapping British English spellings to their American English equivalents. Mappings cover systematic spelling differences including "-ise"/"-ize" (e.g., "organise" → "organize"), "-our"/"-or" (e.g., "colour" → "color"), "-re"/"-er" (e.g., "centre" → "center"), "-ogue"/"-og" (e.g., "catalogue" → "catalog"), and doubled/single consonant variants.
This data file is loaded by the EnglishSpellingNormalizer class to standardize spelling variations before computing Word Error Rate (WER) metrics, ensuring that British and American English transcriptions are treated as equivalent.
Usage
This file is consumed automatically by EnglishSpellingNormalizer at initialization time. It should not typically be loaded directly by users; instead, use EnglishTextNormalizer or EnglishSpellingNormalizer which load the mapping internally.
Code Reference
Source Location
- Repository: Openai_Whisper
- File: whisper/normalizers/english.json
- Lines: 1-1741
Schema
{
"<british_spelling>": "<american_spelling>",
"accessorise": "accessorize",
"colour": "color",
"centre": "center"
}
Import
import json
import os
mapping_path = os.path.join(os.path.dirname(__file__), "english.json")
mapping = json.load(open(mapping_path))
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| file_path | str | Yes | Path to english.json (typically resolved relative to the normalizers package) |
Outputs
| Name | Type | Description |
|---|---|---|
| mapping | Dict[str, str] | Dictionary mapping British spellings (keys) to American spellings (values) |
Usage Examples
Direct Loading
import json
with open("whisper/normalizers/english.json") as f:
mapping = json.load(f)
# Look up a British spelling
american = mapping.get("colour", "colour")
print(american) # "color"
Via EnglishSpellingNormalizer
from whisper.normalizers import EnglishTextNormalizer
normalizer = EnglishTextNormalizer()
# British spellings are automatically converted
text = normalizer("The colour of the centre was analysed")
# Output: "the color of the center was analyzed"