Implementation:Speechbrain Speechbrain Train CVSS S2UT
| Knowledge Sources | |
|---|---|
| Domains | Speech_Translation, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for speech-to-unit translation (S2UT) training on the CVSS dataset provided by the SpeechBrain library.
Description
This recipe defines the S2UT class (subclass of sb.core.Brain) for training a direct speech-to-speech translation system using discrete units. The model uses a wav2vec2 encoder to extract features from source speech, passes them through a dimensionality reduction layer, and then decodes with a Transformer decoder-only architecture to predict discrete unit tokens. The implementation is based on the papers "Direct speech-to-speech translation with discrete units" and "Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation."
Usage
Use this recipe to train a speech-to-unit translation model on the CVSS corpus. Requires source audio data (e.g., CommonVoice) and target CVSS data with pre-extracted discrete unit codes. Supports evaluation with ASR-BLEU metrics and optional vocoder-based waveform synthesis via UnitHIFIGAN.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/CVSS/S2ST/train.py
Signature
class S2UT(sb.core.Brain):
def compute_forward(self, batch, stage):
...
def compute_objectives(self, predictions, batch, stage):
...
Import
python recipes/CVSS/S2ST/train.py hparams/train_fr-en.yaml --src_data_folder=/corpus/CommonVoice/fr --tgt_data_folder=/corpus/CVSS/fr
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| batch | PaddedBatch | Yes | Batch containing src_sig (source waveforms) and code_bos (target unit codes with BOS) |
| stage | sb.Stage | Yes | TRAIN, VALID, or TEST |
Outputs
| Name | Type | Description |
|---|---|---|
| predictions | tuple | Log-softmax probabilities, optional hypotheses, optional synthesized wavs, optional transcripts |
| loss | torch.Tensor | Sequence-level NLL loss on predicted unit tokens |
Usage Examples
python train.py hparams/train_fr-en.yaml --src_data_folder=/corpus/CommonVoice/fr --tgt_data_folder=/corpus/CVSS/fr