Implementation:Speechbrain Speechbrain Train IWSLT22 ST
| Knowledge Sources | |
|---|---|
| Domains | Speech_Translation, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for speech translation training on the IWSLT22 low-resource task provided by the SpeechBrain library.
Description
This recipe defines the ST class (subclass of sb.core.Brain) for fine-tuning a wav2vec2 model for the speech translation task without transcriptions. The architecture uses wav2vec2 as the speech encoder, a dimensionality reduction layer, and a Transformer decoder-only model for generating target language tokens. Decoding uses SentencePiece tokenization and Moses detokenization. Supports separate valid and test beam searches with BLEU score evaluation.
Usage
Use this recipe to train a direct speech translation model on the IWSLT22 low-resource dataset. Requires pre-trained wav2vec2 weights and a SentencePiece tokenizer. Configure with train_w2v2_st.yaml.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/IWSLT22_lowresource/AST/transformer/train.py
Signature
class ST(sb.core.Brain):
def compute_forward(self, batch, stage):
...
def compute_objectives(self, predictions, batch, stage):
...
Import
python recipes/IWSLT22_lowresource/AST/transformer/train.py hparams/train_w2v2_st.yaml --data_folder /path/to/data
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| batch | PaddedBatch | Yes | Batch containing sig (waveforms), tokens_bos, and tokens_eos (target translation tokens) |
| stage | sb.Stage | Yes | TRAIN, VALID, or TEST |
Outputs
| Name | Type | Description |
|---|---|---|
| predictions | tuple | Log-softmax sequence probabilities, wav_lens, and decoded hypotheses |
| loss | torch.Tensor | Sequence-level NLL loss on target translation tokens |
Usage Examples
python train.py hparams/train_w2v2_st.yaml --data_folder /path/to/data