Implementation:Speechbrain Speechbrain Train CVSS S2UT

Knowledge Sources	SpeechBrain
Domains	Speech_Translation, Training
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for speech-to-unit translation (S2UT) training on the CVSS dataset provided by the SpeechBrain library.

Description

This recipe defines the S2UT class (subclass of sb.core.Brain) for training a direct speech-to-speech translation system using discrete units. The model uses a wav2vec2 encoder to extract features from source speech, passes them through a dimensionality reduction layer, and then decodes with a Transformer decoder-only architecture to predict discrete unit tokens. The implementation is based on the papers "Direct speech-to-speech translation with discrete units" and "Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation."

Usage

Use this recipe to train a speech-to-unit translation model on the CVSS corpus. Requires source audio data (e.g., CommonVoice) and target CVSS data with pre-extracted discrete unit codes. Supports evaluation with ASR-BLEU metrics and optional vocoder-based waveform synthesis via UnitHIFIGAN.

Code Reference

Source Location

Repository: SpeechBrain
File: recipes/CVSS/S2ST/train.py

Signature

class S2UT(sb.core.Brain):
    def compute_forward(self, batch, stage):
        ...
    def compute_objectives(self, predictions, batch, stage):
        ...

Import

python recipes/CVSS/S2ST/train.py hparams/train_fr-en.yaml --src_data_folder=/corpus/CommonVoice/fr --tgt_data_folder=/corpus/CVSS/fr

I/O Contract

Inputs

Name	Type	Required	Description
batch	PaddedBatch	Yes	Batch containing src_sig (source waveforms) and code_bos (target unit codes with BOS)
stage	sb.Stage	Yes	TRAIN, VALID, or TEST

Outputs

Name	Type	Description
predictions	tuple	Log-softmax probabilities, optional hypotheses, optional synthesized wavs, optional transcripts
loss	torch.Tensor	Sequence-level NLL loss on predicted unit tokens

Usage Examples

python train.py hparams/train_fr-en.yaml --src_data_folder=/corpus/CommonVoice/fr --tgt_data_folder=/corpus/CVSS/fr

Related Pages

Principle:Speechbrain_Speechbrain_Speech_To_Unit_Translation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment