Implementation:Speechbrain Speechbrain Train CommonVoice Transducer

Knowledge Sources	SpeechBrain
Domains	ASR, Training
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for training a Transducer-based ASR system on the CommonVoice dataset provided by the SpeechBrain library.

Description

This training script implements a Transducer (RNN-T) ASR system with an encoder, a prediction network (decoder), and a joint network. It follows the Dynamic Chunk Training approach for streaming speech recognition, enabling the model to process audio in chunks for real-time applications. The system is trained with both CTC and Transducer losses on BPE sub-word units. The script supports feature augmentation with a configurable warmup period, Conformer-based encoders, and is flexible enough to support different architectures, token types, and CommonVoice languages.

Usage

Use this script to train a Transducer ASR model on any CommonVoice language, particularly when streaming/real-time recognition is needed. Run it with: python train.py hparams/conformer_transducer_large.yaml.

Code Reference

Source Location

Repository: SpeechBrain
File: recipes/CommonVoice/ASR/transducer/train.py

Signature

class ASR(sb.Brain):
    def compute_forward(self, batch, stage):
        """Forward computations from the waveform batches to the output probabilities."""
        ...

    def compute_objectives(self, predictions, batch, stage):
        ...

Import

import speechbrain as sb
from speechbrain.core import Brain

I/O Contract

Inputs

Name	Type	Required	Description
hparams_file	str	Yes	Path to the YAML hyperparameter configuration file (e.g., hparams/conformer_transducer_large.yaml)
batch.sig	tuple	Yes	Waveform tensor and lengths from the dataloader
batch.tokens_bos	tuple	Yes	BPE tokens with beginning-of-sequence marker for prediction network
batch.tokens_eos	tuple	Yes	BPE tokens with end-of-sequence marker
batch.tokens	tuple	Yes	BPE tokens without special markers (for CTC auxiliary loss)

Outputs

Name	Type	Description
p_transducer	tensor	Transducer joint network output log-probabilities
p_ctc	tensor	CTC log-probabilities (auxiliary loss)
wav_lens	tensor	Relative lengths of the input waveforms
model checkpoint	file	Saved model parameters at best and latest epochs
WER/CER metrics	float	Word error rate and character error rate on dev/test sets

Usage Examples

# Command-line usage
# python train.py hparams/conformer_transducer_large.yaml

# Programmatic usage
import sys
from hyperpyyaml import load_hyperpyyaml

hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])
with open(hparams_file) as fin:
    hparams = load_hyperpyyaml(fin, overrides)

asr_brain = ASR(
    modules=hparams["modules"],
    hparams=hparams,
    run_opts=run_opts,
    opt_class=hparams["opt_class"],
    checkpointer=hparams["checkpointer"],
)

asr_brain.fit(
    hparams["epoch_counter"],
    train_data,
    valid_data,
    train_loader_kwargs=hparams["dataloader_options"],
    valid_loader_kwargs=hparams["test_dataloader_options"],
)

Related Pages

Principle:Speechbrain_Speechbrain_Transducer_ASR_Training

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment