Implementation:NVIDIA NeMo Curator WER Metric Stage
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Audio Processing, Speech Metrics, Data Curation |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Provides utility functions and the GetPairwiseWerStage processing stage for computing word error rate (WER) and related speech recognition quality metrics.
Description
This module is the core metrics component for audio curation, enabling quality assessment of ASR transcriptions. It contains:
- get_wer(text, pred_text) -- Computes word error rate as a percentage. Splits both strings into words, computes edit distance using the
editdistancelibrary, and returnsround(word_dist / num_words * 100.0, 2).
- get_cer(text, pred_text) -- Computes character error rate as a percentage. Computes character-level edit distance and returns
round(char_dist / num_chars * 100.0, 2).
- get_charrate(text, duration) -- Computes characters per second as
round(num_chars / duration, 2).
- get_wordrate(text, duration) -- Computes words per second as
round(num_words / duration, 2).
- GetPairwiseWerStage -- A dataclass-based stage extending
LegacySpeechStage. Itsprocess_dataset_entry()method computes WER between the ground truth transcript (text_key) and the ASR prediction (pred_text_key), storing the result inwer_key.
Usage
Use GetPairwiseWerStage in an audio pipeline after an ASR inference stage to compute WER for each entry. The standalone functions (get_wer, get_cer, get_charrate, get_wordrate) can also be used directly outside of a pipeline context.
Code Reference
Source Location
- Repository: NeMo-Curator
- File: nemo_curator/stages/audio/metrics/get_wer.py
- Lines: 1-74
Signature
def get_wer(text: str, pred_text: str) -> float: ...
def get_cer(text: str, pred_text: str) -> float: ...
def get_charrate(text: str, duration: float) -> float: ...
def get_wordrate(text: str, duration: float) -> float: ...
@dataclass
class GetPairwiseWerStage(LegacySpeechStage):
name = "GetPairwiseWerStage"
text_key: str = "text"
pred_text_key: str = "pred_text"
wer_key: str = "wer"
def process_dataset_entry(self, data_entry: dict) -> list[AudioBatch]: ...
Import
from nemo_curator.stages.audio.metrics.get_wer import (
GetPairwiseWerStage,
get_wer,
get_cer,
get_charrate,
get_wordrate,
)
I/O Contract
GetPairwiseWerStage Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| text_key | str | No | Key for ground truth transcript in data entries (default: "text") |
| pred_text_key | str | No | Key for ASR predicted text in data entries (default: "pred_text") |
| wer_key | str | No | Key under which the computed WER will be stored (default: "wer") |
GetPairwiseWerStage Outputs
| Name | Type | Description |
|---|---|---|
| result | list[AudioBatch] | The input entry augmented with the WER value stored under wer_key
|
Standalone Function Signatures
| Function | Inputs | Output | Description |
|---|---|---|---|
| get_wer | text: str, pred_text: str | float | Word error rate as percentage (0-100+), rounded to 2 decimals |
| get_cer | text: str, pred_text: str | float | Character error rate as percentage, rounded to 2 decimals |
| get_charrate | text: str, duration: float | float | Characters per second, rounded to 2 decimals |
| get_wordrate | text: str, duration: float | float | Words per second, rounded to 2 decimals |
Usage Examples
Using the Stage in a Pipeline
from nemo_curator.stages.audio.metrics.get_wer import GetPairwiseWerStage
wer_stage = GetPairwiseWerStage(
text_key="text",
pred_text_key="pred_text",
wer_key="wer",
)
Using Standalone Functions
from nemo_curator.stages.audio.metrics.get_wer import get_wer, get_cer
wer = get_wer("hello world", "hello word")
# wer = 50.0 (1 word error out of 2 words)
cer = get_cer("hello world", "hello word")
# Character-level edit distance percentage
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment