Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Speechbrain Speechbrain Train CommonVoice Transformer

From Leeroopedia


Knowledge Sources
Domains ASR, Training
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for training a Transformer ASR model on the CommonVoice dataset provided by the SpeechBrain library.

Description

This recipe defines the ASR class (subclass of sb.core.Brain) for Transformer-based speech recognition on CommonVoice data. The architecture uses a CNN frontend followed by a Transformer (or Conformer) encoder-decoder with joint CTC/attention decoding. Both waveform and feature augmentation are supported with a configurable warmup period. Beam search decoding with separate valid and test search configurations is applied. BPE tokenization via SentencePiece is used for subword units.

Usage

Use this recipe to train a Transformer or Conformer ASR model with CTC/attention joint decoding on any CommonVoice language. Requires the corresponding hyperparameter YAML file and data preparation script.

Code Reference

Source Location

Signature

class ASR(sb.core.Brain):
    def compute_forward(self, batch, stage):
        """Forward computations from the waveform batches to the output probabilities."""
        ...
    def compute_objectives(self, predictions, batch, stage):
        """Computes the loss (CTC+NLL) given predictions and targets."""
        ...

Import

# Run as recipe script
python recipes/CommonVoice/ASR/transformer/train.py hparams/conformer_large.yaml --data_folder /path/to/commonvoice

I/O Contract

Inputs

Name Type Required Description
batch.sig torch.Tensor Yes Input waveform signal
batch.tokens_bos torch.Tensor Yes Target token sequence with BOS prefix
batch.tokens_eos torch.Tensor Yes Target token sequence with EOS suffix
batch.tokens torch.Tensor Yes Target token sequence (for CTC)

Outputs

Name Type Description
p_ctc torch.Tensor CTC log-probabilities from encoder
p_seq torch.Tensor Seq2seq log-probabilities from Transformer decoder
wav_lens torch.Tensor Relative waveform lengths
hyps list Beam search hypotheses (at validation/test)

Usage Examples

python train.py hparams/conformer_large.yaml --data_folder /path/to/CommonVoice

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment