Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Speechbrain Speechbrain Brain Evaluate With ErrorRateStats

From Leeroopedia
Revision as of 16:43, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Speechbrain_Speechbrain_Brain_Evaluate_With_ErrorRateStats.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Field Value
Implementation Name Brain_Evaluate_With_ErrorRateStats
API Signature Brain.evaluate(self, test_set, max_key=None, min_key=None, progressbar=None, test_loader_kwargs={}) and ErrorRateStats.__init__(self, merge_tokens=False, split_tokens=False, space_token="_", keep_values=True, extract_concepts_values=False, tag_in="", tag_out="", equality_comparator=_str_equals)
Source File speechbrain/core.py:L1695-1754 (evaluate), speechbrain/utils/metric_stats.py:L206-378 (ErrorRateStats)
Import from speechbrain.core import Brain and from speechbrain.utils.metric_stats import ErrorRateStats
Type API Doc
Related Principle Principle:Speechbrain_Speechbrain_ASR_Evaluation_With_WER

Description

Brain.evaluate() performs model evaluation on a test set by loading the best checkpoint (selected by a metric key), running inference with gradients disabled, and computing aggregate metrics. ErrorRateStats is the metric accumulator class that computes Word Error Rate (WER) and Character Error Rate (CER) by tracking per-utterance edit distances between hypotheses and references.

Brain.evaluate()

Inputs

Parameter Type Default Description
test_set Dataset or DataLoader (required) Test data to evaluate on. If a DynamicItemDataset, a DataLoader is automatically created.
max_key str None Metric key to maximize when selecting the best checkpoint. Mutually exclusive with min_key.
min_key str None Metric key to minimize when selecting the best checkpoint. For ASR, typically "WER".
progressbar bool None Whether to display a progress bar. If None, determined by the noprogressbar run option.
test_loader_kwargs dict {} Keyword arguments for DataLoader creation. ckpt_prefix is automatically set to None so the test DataLoader is not added to the checkpointer.

Outputs

Returns the average test loss (float). Side effects include:

  • WER/CER metrics are computed and logged
  • Detailed alignment statistics are written to the test WER file
  • Statistics are printed via the train logger

Execution Flow

evaluate(test_set, min_key="WER")
  |
  +-- Create DataLoader from test_set if needed
  +-- on_evaluate_start(min_key="WER")
  |     +-- Load best checkpoint (lowest WER)
  +-- on_stage_start(TEST, epoch=None)
  |     +-- Initialize ErrorRateStats for WER and CER
  +-- modules.eval()
  +-- torch.no_grad():
  |     +-- for each batch in test_set:
  |           +-- evaluate_batch(batch, TEST)
  |                 +-- compute_forward(batch, TEST)
  |                 |     -> p_ctc, wav_lens, p_tokens (beam search)
  |                 +-- compute_objectives(preds, batch, TEST)
  |                       -> CTC loss + WER/CER accumulation
  +-- on_stage_end(TEST, avg_test_loss, None)
        +-- Summarize WER/CER statistics
        +-- Log test statistics
        +-- Write detailed alignment file

ErrorRateStats

Constructor Parameters

Parameter Type Default Description
merge_tokens bool False Merge successive tokens into words (e.g., character-level to word-level)
split_tokens bool False Split tokens into characters (e.g., word-level to character-level). Used for CER computation.
space_token str "_" Token used as word boundary. Used with merge_tokens for splitting after merge, or with split_tokens for joining before split.
keep_values bool True Whether to keep concept values in structured output evaluation
extract_concepts_values bool False Process predictions/targets to extract concepts and values
tag_in str "" Start tag for concept extraction
tag_out str "" End tag for concept extraction
equality_comparator Callable _str_equals Function to compare two tokens for equality

Key Methods

append(ids, predict, target, predict_len=None, target_len=None, ind2lab=None)

Adds per-utterance error statistics for a batch.

Parameter Type Description
ids list List of utterance IDs for the batch
predict list or torch.Tensor Predicted word/token sequences
target list or torch.Tensor Reference word/token sequences
predict_len torch.Tensor Relative lengths for undoing prediction padding (optional)
target_len torch.Tensor Relative lengths for undoing target padding (optional)
ind2lab callable Maps from indices to labels for alignment writing (optional)

The method:

  1. Undoes padding if length tensors are provided
  2. Applies index-to-label mapping if ind2lab is given
  3. Optionally merges or splits tokens
  4. Computes per-utterance WER details including alignments
  5. Stores scores for later summarization

summarize(field=None)

Aggregates all per-utterance scores into corpus-level statistics.

Returns (when field=None): a dict with keys:

Key Type Description
"WER" float Overall Word Error Rate as a percentage
"error_rate" float Same as WER (generic alias)
"insertions" int Total insertion errors across all utterances
"deletions" int Total deletion errors across all utterances
"substitutions" int Total substitution errors across all utterances

When field is specified (e.g., "error_rate"), returns only that specific value.

write_stats(filestream)

Writes detailed statistics and per-utterance alignment information to a file stream.

with open(self.hparams.test_wer_file, "w", encoding="utf-8") as w:
    self.wer_metric.write_stats(w)

Output includes a summary header followed by per-utterance alignments showing substitutions, insertions, and deletions.

YAML Configuration

The WER and CER metric computers are configured in YAML:

# WER computer (word-level)
error_rate_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats

# CER computer (character-level, using split_tokens)
cer_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats
    split_tokens: True

The !name: tag creates a callable (constructor) rather than an instance. The actual ErrorRateStats instances are created in on_stage_start() at the beginning of each validation/test stage:

def on_stage_start(self, stage, epoch):
    if stage != sb.Stage.TRAIN:
        self.cer_metric = self.hparams.cer_computer()   # Creates new ErrorRateStats
        self.wer_metric = self.hparams.error_rate_computer()  # Creates new ErrorRateStats

Usage Example

Complete Evaluation Flow

import speechbrain as sb

# After training is complete, evaluate on test set
asr_brain = ASR(
    modules=hparams["modules"],
    hparams=hparams,
    run_opts=run_opts,
    checkpointer=hparams["checkpointer"],
)

# Load best checkpoint and evaluate
asr_brain.evaluate(
    test_data,
    min_key="WER",
    test_loader_kwargs=hparams["test_dataloader_options"],
)

# The on_stage_end callback writes detailed WER stats:
# on_stage_end(TEST, avg_loss, None):
#     with open(self.hparams.test_wer_file, "w") as w:
#         self.wer_metric.write_stats(w)

Standalone ErrorRateStats Usage

from speechbrain.utils.metric_stats import ErrorRateStats

# Word Error Rate
wer_stats = ErrorRateStats()
wer_stats.append(
    ids=["utt1", "utt2"],
    predict=[["the", "cat", "set"], ["hello", "world"]],
    target=[["the", "cat", "sat"], ["hello", "world"]],
)
summary = wer_stats.summarize()
print(f"WER: {summary['WER']:.2f}%")
print(f"Substitutions: {summary['substitutions']}")
print(f"Deletions: {summary['deletions']}")
print(f"Insertions: {summary['insertions']}")

# Character Error Rate
cer_stats = ErrorRateStats(split_tokens=True)
cer_stats.append(
    ids=["utt1"],
    predict=[["the", "cat", "set"]],
    target=[["the", "cat", "sat"]],
)
cer_summary = cer_stats.summarize()
print(f"CER: {cer_summary['WER']:.2f}%")

Checkpoint Selection

The min_key="WER" parameter in evaluate() instructs the checkpointer to load the checkpoint with the lowest WER value. This checkpoint was saved during training by:

self.checkpointer.save_and_keep_only(
    meta={"WER": stage_stats["WER"]},
    min_keys=["WER"],
)

The checkpointer stores WER values in checkpoint metadata and can select the optimal checkpoint at evaluation time. This ensures that the model evaluated on the test set is the one that performed best on the validation set, not simply the most recent.

Dependencies

  • speechbrain.utils.edit_distance.wer_details_for_batch -- computes per-utterance edit distance and alignment details
  • speechbrain.utils.edit_distance.wer_summary -- aggregates per-utterance scores into corpus-level WER
  • speechbrain.utils.edit_distance.print_wer_summary -- formats WER summary for output
  • speechbrain.utils.edit_distance.print_alignments -- formats per-utterance alignments for output
  • speechbrain.dataio.dataio.merge_char / split_word -- for character-level processing

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment