Implementation:Speechbrain Speechbrain Train CommonVoice Seq2Seq

Knowledge Sources	SpeechBrain
Domains	ASR, Training
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for training a sequence-to-sequence ASR system on the CommonVoice dataset provided by the SpeechBrain library.

Description

This training script implements a sequence-to-sequence ASR system with an encoder-decoder architecture and attention mechanism. The default configuration uses a CRDNN encoder, a GRU-based decoder, and BeamSearch decoding (without an external language model). The neural network is trained with joint CTC and negative log-likelihood losses using BPE (Byte Pairwise Encoding) sub-word units. The script supports waveform and feature augmentation, and is flexible enough to support different encoders, decoders, token types, and training languages across all CommonVoice languages.

Usage

Use this script to train a seq2seq ASR model on any CommonVoice language. Run it with a YAML hyperparameter file: python train.py hparams/train.yaml.

Code Reference

Source Location

Repository: SpeechBrain
File: recipes/CommonVoice/ASR/seq2seq/train.py

Signature

class ASR(sb.core.Brain):
    def compute_forward(self, batch, stage):
        """Forward computations from the waveform batches to the output probabilities."""
        ...

    def compute_objectives(self, predictions, batch, stage):
        ...

Import

import speechbrain as sb
from speechbrain.core import Brain

I/O Contract

Inputs

Name	Type	Required	Description
hparams_file	str	Yes	Path to the YAML hyperparameter configuration file (e.g., hparams/train.yaml)
batch.sig	tuple	Yes	Waveform tensor and lengths from the dataloader
batch.tokens_bos	tuple	Yes	BPE tokens with beginning-of-sequence marker
batch.tokens_eos	tuple	Yes	BPE tokens with end-of-sequence marker
batch.tokens	tuple	Yes	BPE tokens without special markers (for CTC)

Outputs

Name	Type	Description
p_ctc	tensor	CTC log-probabilities over the token vocabulary
p_seq	tensor	Seq2seq output probabilities from the decoder
wav_lens	tensor	Relative lengths of the input waveforms
model checkpoint	file	Saved model parameters at best and latest epochs
WER/CER metrics	float	Word error rate and character error rate on dev/test sets

Usage Examples

# Command-line usage
# python train.py hparams/train.yaml

# Programmatic usage
import sys
from hyperpyyaml import load_hyperpyyaml

hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])
with open(hparams_file) as fin:
    hparams = load_hyperpyyaml(fin, overrides)

asr_brain = ASR(
    modules=hparams["modules"],
    hparams=hparams,
    run_opts=run_opts,
    opt_class=hparams["opt_class"],
    checkpointer=hparams["checkpointer"],
)

asr_brain.fit(
    hparams["epoch_counter"],
    train_data,
    valid_data,
    train_loader_kwargs=hparams["dataloader_options"],
    valid_loader_kwargs=hparams["test_dataloader_options"],
)

Related Pages

Principle:Speechbrain_Speechbrain_Seq2Seq_ASR_Training

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment