Implementation:Speechbrain Speechbrain Train CommonVoice Seq2Seq
| Knowledge Sources | |
|---|---|
| Domains | ASR, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for training a sequence-to-sequence ASR system on the CommonVoice dataset provided by the SpeechBrain library.
Description
This training script implements a sequence-to-sequence ASR system with an encoder-decoder architecture and attention mechanism. The default configuration uses a CRDNN encoder, a GRU-based decoder, and BeamSearch decoding (without an external language model). The neural network is trained with joint CTC and negative log-likelihood losses using BPE (Byte Pairwise Encoding) sub-word units. The script supports waveform and feature augmentation, and is flexible enough to support different encoders, decoders, token types, and training languages across all CommonVoice languages.
Usage
Use this script to train a seq2seq ASR model on any CommonVoice language. Run it with a YAML hyperparameter file: python train.py hparams/train.yaml.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/CommonVoice/ASR/seq2seq/train.py
Signature
class ASR(sb.core.Brain):
def compute_forward(self, batch, stage):
"""Forward computations from the waveform batches to the output probabilities."""
...
def compute_objectives(self, predictions, batch, stage):
...
Import
import speechbrain as sb
from speechbrain.core import Brain
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| hparams_file | str | Yes | Path to the YAML hyperparameter configuration file (e.g., hparams/train.yaml) |
| batch.sig | tuple | Yes | Waveform tensor and lengths from the dataloader |
| batch.tokens_bos | tuple | Yes | BPE tokens with beginning-of-sequence marker |
| batch.tokens_eos | tuple | Yes | BPE tokens with end-of-sequence marker |
| batch.tokens | tuple | Yes | BPE tokens without special markers (for CTC) |
Outputs
| Name | Type | Description |
|---|---|---|
| p_ctc | tensor | CTC log-probabilities over the token vocabulary |
| p_seq | tensor | Seq2seq output probabilities from the decoder |
| wav_lens | tensor | Relative lengths of the input waveforms |
| model checkpoint | file | Saved model parameters at best and latest epochs |
| WER/CER metrics | float | Word error rate and character error rate on dev/test sets |
Usage Examples
# Command-line usage
# python train.py hparams/train.yaml
# Programmatic usage
import sys
from hyperpyyaml import load_hyperpyyaml
hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])
with open(hparams_file) as fin:
hparams = load_hyperpyyaml(fin, overrides)
asr_brain = ASR(
modules=hparams["modules"],
hparams=hparams,
run_opts=run_opts,
opt_class=hparams["opt_class"],
checkpointer=hparams["checkpointer"],
)
asr_brain.fit(
hparams["epoch_counter"],
train_data,
valid_data,
train_loader_kwargs=hparams["dataloader_options"],
valid_loader_kwargs=hparams["test_dataloader_options"],
)