Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Speechbrain Speechbrain Train CommonVoice Transducer

From Leeroopedia


Knowledge Sources
Domains ASR, Training
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for training a Transducer-based ASR system on the CommonVoice dataset provided by the SpeechBrain library.

Description

This training script implements a Transducer (RNN-T) ASR system with an encoder, a prediction network (decoder), and a joint network. It follows the Dynamic Chunk Training approach for streaming speech recognition, enabling the model to process audio in chunks for real-time applications. The system is trained with both CTC and Transducer losses on BPE sub-word units. The script supports feature augmentation with a configurable warmup period, Conformer-based encoders, and is flexible enough to support different architectures, token types, and CommonVoice languages.

Usage

Use this script to train a Transducer ASR model on any CommonVoice language, particularly when streaming/real-time recognition is needed. Run it with: python train.py hparams/conformer_transducer_large.yaml.

Code Reference

Source Location

Signature

class ASR(sb.Brain):
    def compute_forward(self, batch, stage):
        """Forward computations from the waveform batches to the output probabilities."""
        ...

    def compute_objectives(self, predictions, batch, stage):
        ...

Import

import speechbrain as sb
from speechbrain.core import Brain

I/O Contract

Inputs

Name Type Required Description
hparams_file str Yes Path to the YAML hyperparameter configuration file (e.g., hparams/conformer_transducer_large.yaml)
batch.sig tuple Yes Waveform tensor and lengths from the dataloader
batch.tokens_bos tuple Yes BPE tokens with beginning-of-sequence marker for prediction network
batch.tokens_eos tuple Yes BPE tokens with end-of-sequence marker
batch.tokens tuple Yes BPE tokens without special markers (for CTC auxiliary loss)

Outputs

Name Type Description
p_transducer tensor Transducer joint network output log-probabilities
p_ctc tensor CTC log-probabilities (auxiliary loss)
wav_lens tensor Relative lengths of the input waveforms
model checkpoint file Saved model parameters at best and latest epochs
WER/CER metrics float Word error rate and character error rate on dev/test sets

Usage Examples

# Command-line usage
# python train.py hparams/conformer_transducer_large.yaml

# Programmatic usage
import sys
from hyperpyyaml import load_hyperpyyaml

hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])
with open(hparams_file) as fin:
    hparams = load_hyperpyyaml(fin, overrides)

asr_brain = ASR(
    modules=hparams["modules"],
    hparams=hparams,
    run_opts=run_opts,
    opt_class=hparams["opt_class"],
    checkpointer=hparams["checkpointer"],
)

asr_brain.fit(
    hparams["epoch_counter"],
    train_data,
    valid_data,
    train_loader_kwargs=hparams["dataloader_options"],
    valid_loader_kwargs=hparams["test_dataloader_options"],
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment