Implementation:Speechbrain Speechbrain Train PeoplesSpeech
| Knowledge Sources | |
|---|---|
| Domains | ASR, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for training a Conformer ASR model on the People's Speech dataset provided by the SpeechBrain library.
Description
This recipe defines the ASR class (subclass of sb.core.Brain) for Conformer-based speech recognition on the People's Speech dataset (28,000 hours). The architecture uses a CNN frontend and a Conformer encoder with a Transformer decoder. Joint CTC/attention training with label smoothing is used. Feature augmentation is supported during training. Beam search decoding is applied at validation and test stages. BPE tokenization via SentencePiece is used for subword units.
Usage
Use this recipe to train a Conformer ASR model with CTC/attention joint decoding on the People's Speech dataset. Requires the corresponding hyperparameter YAML file and data preparation script.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/PeoplesSpeech/ASR/transformer/train.py
Signature
class ASR(sb.core.Brain):
def compute_forward(self, batch, stage):
"""Forward computations from the waveform batches to the output probabilities."""
...
def compute_objectives(self, predictions, batch, stage):
"""Computes the loss (CTC+NLL) given predictions and targets."""
...
Import
# Run as recipe script
python recipes/PeoplesSpeech/ASR/transformer/train.py hparams/conformer_large.yaml --data_folder /path/to/PeoplesSpeech
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| batch.sig | torch.Tensor | Yes | Input waveform signal |
| batch.tokens_bos | torch.Tensor | Yes | Target token sequence with BOS prefix |
| batch.tokens_eos | torch.Tensor | Yes | Target token sequence with EOS suffix |
| batch.tokens | torch.Tensor | Yes | Target token sequence (for CTC) |
Outputs
| Name | Type | Description |
|---|---|---|
| p_ctc | torch.Tensor | CTC log-probabilities from encoder |
| p_seq | torch.Tensor | Seq2seq log-probabilities from Transformer decoder |
| wav_lens | torch.Tensor | Relative waveform lengths |
| hyps | list | Beam search hypotheses (at validation/test) |
Usage Examples
python train.py hparams/conformer_large.yaml --data_folder /path/to/PeoplesSpeech