Implementation:Speechbrain Speechbrain Train GigaSpeech Transducer
| Knowledge Sources | |
|---|---|
| Domains | ASR, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for training a Transducer ASR model on the GigaSpeech dataset provided by the SpeechBrain library.
Description
This recipe defines the ASR class (subclass of sb.Brain) for Conformer-Transducer speech recognition on GigaSpeech. The architecture uses a Conformer encoder with an LSTM-based transducer decoder and a joint network. Dynamic Chunk Training (streaming support) is optionally enabled. The model is jointly trained with transducer loss, CTC, and optional cross-entropy losses. Beam search decoding coupled with an RNN language model is supported. BPE tokenization via SentencePiece is used for subword units.
Usage
Use this recipe to train a Conformer-Transducer ASR model on the GigaSpeech dataset (supporting XS through XL splits). Requires the corresponding hyperparameter YAML file and data preparation script.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/GigaSpeech/ASR/transducer/train.py
Signature
class ASR(sb.Brain):
def compute_forward(self, batch, stage):
"""Forward computations from the waveform batches to the output probabilities."""
...
def compute_objectives(self, predictions, batch, stage):
"""Computes the loss given predictions and targets."""
...
Import
# Run as recipe script
python recipes/GigaSpeech/ASR/transducer/train.py hparams/conformer_transducer.yaml --data_folder /path/to/GigaSpeech
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| batch.sig | torch.Tensor | Yes | Input waveform signal |
| batch.tokens_bos | torch.Tensor | Yes | Target token sequence with BOS prefix |
| batch.tokens | torch.Tensor | Yes | Target token sequence |
Outputs
| Name | Type | Description |
|---|---|---|
| p_transducer | torch.Tensor | Transducer joint network log-probabilities |
| p_ctc | torch.Tensor | CTC log-probabilities from encoder (optional) |
| p_ce | torch.Tensor | Cross-entropy predictions from decoder (optional) |
| wav_lens | torch.Tensor | Relative waveform lengths |
| hyps | list | Beam search hypotheses (at validation/test) |
Usage Examples
python train.py hparams/conformer_transducer.yaml --data_folder /path/to/GigaSpeech