Implementation:Speechbrain Speechbrain Train CommonVoice Transducer
| Knowledge Sources | |
|---|---|
| Domains | ASR, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for training a Transducer-based ASR system on the CommonVoice dataset provided by the SpeechBrain library.
Description
This training script implements a Transducer (RNN-T) ASR system with an encoder, a prediction network (decoder), and a joint network. It follows the Dynamic Chunk Training approach for streaming speech recognition, enabling the model to process audio in chunks for real-time applications. The system is trained with both CTC and Transducer losses on BPE sub-word units. The script supports feature augmentation with a configurable warmup period, Conformer-based encoders, and is flexible enough to support different architectures, token types, and CommonVoice languages.
Usage
Use this script to train a Transducer ASR model on any CommonVoice language, particularly when streaming/real-time recognition is needed. Run it with: python train.py hparams/conformer_transducer_large.yaml.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/CommonVoice/ASR/transducer/train.py
Signature
class ASR(sb.Brain):
def compute_forward(self, batch, stage):
"""Forward computations from the waveform batches to the output probabilities."""
...
def compute_objectives(self, predictions, batch, stage):
...
Import
import speechbrain as sb
from speechbrain.core import Brain
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| hparams_file | str | Yes | Path to the YAML hyperparameter configuration file (e.g., hparams/conformer_transducer_large.yaml) |
| batch.sig | tuple | Yes | Waveform tensor and lengths from the dataloader |
| batch.tokens_bos | tuple | Yes | BPE tokens with beginning-of-sequence marker for prediction network |
| batch.tokens_eos | tuple | Yes | BPE tokens with end-of-sequence marker |
| batch.tokens | tuple | Yes | BPE tokens without special markers (for CTC auxiliary loss) |
Outputs
| Name | Type | Description |
|---|---|---|
| p_transducer | tensor | Transducer joint network output log-probabilities |
| p_ctc | tensor | CTC log-probabilities (auxiliary loss) |
| wav_lens | tensor | Relative lengths of the input waveforms |
| model checkpoint | file | Saved model parameters at best and latest epochs |
| WER/CER metrics | float | Word error rate and character error rate on dev/test sets |
Usage Examples
# Command-line usage
# python train.py hparams/conformer_transducer_large.yaml
# Programmatic usage
import sys
from hyperpyyaml import load_hyperpyyaml
hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])
with open(hparams_file) as fin:
hparams = load_hyperpyyaml(fin, overrides)
asr_brain = ASR(
modules=hparams["modules"],
hparams=hparams,
run_opts=run_opts,
opt_class=hparams["opt_class"],
checkpointer=hparams["checkpointer"],
)
asr_brain.fit(
hparams["epoch_counter"],
train_data,
valid_data,
train_loader_kwargs=hparams["dataloader_options"],
valid_loader_kwargs=hparams["test_dataloader_options"],
)