Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Speechbrain Speechbrain Train CommonVoice Seq2Seq

From Leeroopedia


Knowledge Sources
Domains ASR, Training
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for training a sequence-to-sequence ASR system on the CommonVoice dataset provided by the SpeechBrain library.

Description

This training script implements a sequence-to-sequence ASR system with an encoder-decoder architecture and attention mechanism. The default configuration uses a CRDNN encoder, a GRU-based decoder, and BeamSearch decoding (without an external language model). The neural network is trained with joint CTC and negative log-likelihood losses using BPE (Byte Pairwise Encoding) sub-word units. The script supports waveform and feature augmentation, and is flexible enough to support different encoders, decoders, token types, and training languages across all CommonVoice languages.

Usage

Use this script to train a seq2seq ASR model on any CommonVoice language. Run it with a YAML hyperparameter file: python train.py hparams/train.yaml.

Code Reference

Source Location

Signature

class ASR(sb.core.Brain):
    def compute_forward(self, batch, stage):
        """Forward computations from the waveform batches to the output probabilities."""
        ...

    def compute_objectives(self, predictions, batch, stage):
        ...

Import

import speechbrain as sb
from speechbrain.core import Brain

I/O Contract

Inputs

Name Type Required Description
hparams_file str Yes Path to the YAML hyperparameter configuration file (e.g., hparams/train.yaml)
batch.sig tuple Yes Waveform tensor and lengths from the dataloader
batch.tokens_bos tuple Yes BPE tokens with beginning-of-sequence marker
batch.tokens_eos tuple Yes BPE tokens with end-of-sequence marker
batch.tokens tuple Yes BPE tokens without special markers (for CTC)

Outputs

Name Type Description
p_ctc tensor CTC log-probabilities over the token vocabulary
p_seq tensor Seq2seq output probabilities from the decoder
wav_lens tensor Relative lengths of the input waveforms
model checkpoint file Saved model parameters at best and latest epochs
WER/CER metrics float Word error rate and character error rate on dev/test sets

Usage Examples

# Command-line usage
# python train.py hparams/train.yaml

# Programmatic usage
import sys
from hyperpyyaml import load_hyperpyyaml

hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])
with open(hparams_file) as fin:
    hparams = load_hyperpyyaml(fin, overrides)

asr_brain = ASR(
    modules=hparams["modules"],
    hparams=hparams,
    run_opts=run_opts,
    opt_class=hparams["opt_class"],
    checkpointer=hparams["checkpointer"],
)

asr_brain.fit(
    hparams["epoch_counter"],
    train_data,
    valid_data,
    train_loader_kwargs=hparams["dataloader_options"],
    valid_loader_kwargs=hparams["test_dataloader_options"],
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment