Implementation:Speechbrain Speechbrain Train KsponSpeech
| Knowledge Sources | |
|---|---|
| Domains | ASR, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for training a Transformer ASR model on the KsponSpeech dataset provided by the SpeechBrain library.
Description
This recipe defines the ASR class (subclass of sb.core.Brain) for Transformer/Conformer-based speech recognition on the KsponSpeech Korean dataset (965.2 hours). The architecture uses a CNN frontend, a Transformer or Conformer encoder-decoder, with joint CTC/attention training and label smoothing. Feature augmentation is supported during training. Beam search decoding coupled with a Transformer language model is used at evaluation. The best model is averaged over the last 5 checkpoints.
Usage
Use this recipe to train a Transformer or Conformer ASR model with CTC/attention joint decoding on the KsponSpeech Korean dataset. Requires the corresponding hyperparameter YAML file and data preparation script.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/KsponSpeech/ASR/transformer/train.py
Signature
class ASR(sb.core.Brain):
def compute_forward(self, batch, stage):
"""Forward computations from the waveform batches to the output probabilities."""
...
def compute_objectives(self, predictions, batch, stage):
"""Computes the loss (CTC+NLL) given predictions and targets."""
...
Import
# Run as recipe script
python recipes/KsponSpeech/ASR/transformer/train.py hparams/conformer_medium.yaml --data_folder /path/to/KsponSpeech
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| batch.sig | torch.Tensor | Yes | Input waveform signal |
| batch.tokens_bos | torch.Tensor | Yes | Target token sequence with BOS prefix |
| batch.tokens_eos | torch.Tensor | Yes | Target token sequence with EOS suffix |
| batch.tokens | torch.Tensor | Yes | Target token sequence (for CTC) |
Outputs
| Name | Type | Description |
|---|---|---|
| p_ctc | torch.Tensor | CTC log-probabilities from encoder |
| p_seq | torch.Tensor | Seq2seq log-probabilities from Transformer decoder |
| wav_lens | torch.Tensor | Relative waveform lengths |
| hyps | list | Beam search hypotheses (at validation/test) |
Usage Examples
python train.py hparams/conformer_medium.yaml --data_folder /path/to/KsponSpeech