Implementation:NVIDIA NeMo Curator WER Metric Stage

Knowledge Sources	NVIDIA NeMo Curator
Domains	Audio Processing, Speech Metrics, Data Curation
Last Updated	2026-02-14 00:00 GMT

Overview

Provides utility functions and the GetPairwiseWerStage processing stage for computing word error rate (WER) and related speech recognition quality metrics.

Description

This module is the core metrics component for audio curation, enabling quality assessment of ASR transcriptions. It contains:

get_wer(text, pred_text) -- Computes word error rate as a percentage. Splits both strings into words, computes edit distance using the editdistance library, and returns round(word_dist / num_words * 100.0, 2).

get_cer(text, pred_text) -- Computes character error rate as a percentage. Computes character-level edit distance and returns round(char_dist / num_chars * 100.0, 2).

get_charrate(text, duration) -- Computes characters per second as round(num_chars / duration, 2).

get_wordrate(text, duration) -- Computes words per second as round(num_words / duration, 2).

GetPairwiseWerStage -- A dataclass-based stage extending LegacySpeechStage. Its process_dataset_entry() method computes WER between the ground truth transcript (text_key) and the ASR prediction (pred_text_key), storing the result in wer_key.

Usage

Use GetPairwiseWerStage in an audio pipeline after an ASR inference stage to compute WER for each entry. The standalone functions (get_wer, get_cer, get_charrate, get_wordrate) can also be used directly outside of a pipeline context.

Code Reference

Source Location

Repository: NeMo-Curator
File: nemo_curator/stages/audio/metrics/get_wer.py
Lines: 1-74

Signature

def get_wer(text: str, pred_text: str) -> float: ...
def get_cer(text: str, pred_text: str) -> float: ...
def get_charrate(text: str, duration: float) -> float: ...
def get_wordrate(text: str, duration: float) -> float: ...


@dataclass
class GetPairwiseWerStage(LegacySpeechStage):
    name = "GetPairwiseWerStage"
    text_key: str = "text"
    pred_text_key: str = "pred_text"
    wer_key: str = "wer"

    def process_dataset_entry(self, data_entry: dict) -> list[AudioBatch]: ...

Import

from nemo_curator.stages.audio.metrics.get_wer import (
    GetPairwiseWerStage,
    get_wer,
    get_cer,
    get_charrate,
    get_wordrate,
)

I/O Contract

GetPairwiseWerStage Inputs

Name	Type	Required	Description
text_key	str	No	Key for ground truth transcript in data entries (default: "text")
pred_text_key	str	No	Key for ASR predicted text in data entries (default: "pred_text")
wer_key	str	No	Key under which the computed WER will be stored (default: "wer")

GetPairwiseWerStage Outputs

Name	Type	Description
result	list[AudioBatch]	The input entry augmented with the WER value stored under `wer_key`

Standalone Function Signatures

Function	Inputs	Output	Description
get_wer	text: str, pred_text: str	float	Word error rate as percentage (0-100+), rounded to 2 decimals
get_cer	text: str, pred_text: str	float	Character error rate as percentage, rounded to 2 decimals
get_charrate	text: str, duration: float	float	Characters per second, rounded to 2 decimals
get_wordrate	text: str, duration: float	float	Words per second, rounded to 2 decimals

Usage Examples

Using the Stage in a Pipeline

from nemo_curator.stages.audio.metrics.get_wer import GetPairwiseWerStage

wer_stage = GetPairwiseWerStage(
    text_key="text",
    pred_text_key="pred_text",
    wer_key="wer",
)

Using Standalone Functions

from nemo_curator.stages.audio.metrics.get_wer import get_wer, get_cer

wer = get_wer("hello world", "hello word")
# wer = 50.0 (1 word error out of 2 words)

cer = get_cer("hello world", "hello word")
# Character-level edit distance percentage

Related Pages

Environment:NVIDIA_NeMo_Curator_Python_Linux_Base

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment