Implementation:Speechbrain Speechbrain Compute WER Tool

Knowledge Sources	SpeechBrain
Domains	ASR, Evaluation
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for computing Word Error Rate (WER) and Levenshtein alignments between reference and hypothesis transcriptions provided by the SpeechBrain library.

Description

This command-line script computes Word Error Rate and related metrics given a reference and a hypothesis text file. It closely matches the behavior of Kaldi's compute_wer binary. The script reads Kaldi-style text files where each line starts with an utterance ID followed by space-separated tokens. It supports three modes for handling missing hypotheses: "strict" (raises error), "all" (treats missing as empty), and "present" (only scores found hypotheses). Additional features include: printing human-readable edit distance alignments between references and hypotheses, identifying utterances with the highest WER, and (when provided with a utt2spk mapping) identifying speakers with the highest WER. The implementation delegates to speechbrain.utils.edit_distance for Levenshtein computation and speechbrain.dataio.wer for formatted output.

Usage

Run as a standalone CLI tool after ASR decoding to evaluate transcription quality. Accepts the same file format as Kaldi's text files.

Code Reference

Source Location

Repository: SpeechBrain
File: tools/compute_wer.py

Signature

def _plain_text_reader(path):
    """Yields (key, token_list) from a Kaldi-style text file."""
    ...

def _plain_text_keydict(path):
    """Returns dict mapping utterance ID to token list."""
    ...

def _utt2spk_keydict(path):
    """Returns dict mapping utterance ID to speaker ID."""
    ...

class SmartFormatter(argparse.HelpFormatter):
    """Custom formatter for argparse help text."""
    ...

# Main: parses args, computes wer_details_by_utterance, wer_summary,
# optionally prints top WER utterances, speaker-level WER, and alignments

Import

python compute_wer.py ref.txt hyp.txt [options]

I/O Contract

Inputs

Name	Type	Required	Description
ref	str	Yes	Path to reference text file (utterance-ID on first column)
hyp	str	Yes	Path to hypothesis text file (utterance-ID on first column)
--mode	str	No	How to treat missing hypotheses: "present", "all", or "strict" (default: "strict")
--print-top-wer	flag	No	Print a list of utterances with the highest WER
--print-alignments	flag	No	Print Levenshtein alignments between all refs and hyps
--align-separator	str	No	Token separator in alignment output (default: " ; ")
--align-empty	str	No	Symbol for empty alignment slots (default: "<eps>")
--utt2spk	str	No	Path to utterance-to-speaker mapping file

Outputs

Name	Type	Description
WER summary	stdout	Overall WER, number of insertions, deletions, substitutions, and total words
Top WER utterances	stdout	Utterances with highest error rates (with --print-top-wer)
Top WER speakers	stdout	Speakers with highest error rates (with --utt2spk)
Alignments	stdout	Human-readable edit distance alignments (with --print-alignments)

Usage Examples

# Basic WER computation
python compute_wer.py ref.txt hyp.txt

# Show top error utterances and alignments
python compute_wer.py ref.txt hyp.txt --print-top-wer --print-alignments

# With speaker-level analysis
python compute_wer.py ref.txt hyp.txt --print-top-wer --utt2spk utt2spk.txt

# Only score hypotheses that are present (skip missing)
python compute_wer.py ref.txt hyp.txt --mode present

Related Pages

Principle:Speechbrain_Speechbrain_ASR_Evaluation_With_WER

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment