Implementation:Speechbrain Speechbrain Train Voicebank MTL
| Knowledge Sources | |
|---|---|
| Domains | Multi_Task_Learning, Speech_Enhancement |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for multi-task learning combining speech enhancement and ASR on the Voicebank dataset provided by the SpeechBrain library.
Description
This recipe defines the MTLbrain class (subclass of sb.Brain) for multi-task learning using both seq2seq ASR and speech enhancement objectives on the Voicebank-DEMAND dataset. The training proceeds in three stages: perceptual pretraining (pretrain_perceptual.yaml), enhancement with mimic loss (enhance_mimic.yaml), and robust ASR training (robust_asr.yaml). Different losses can be toggled on and off, and pre-trained models can be loaded for either component. Evaluation includes PESQ, eSTOI, and composite metrics (CSIG, CBAK, COVL) via a custom CompositeStats metric class.
Usage
Use this recipe to train a multi-task ASR and enhancement system on the Voicebank-DEMAND (noisy-VCTK) dataset. Follows a three-stage training procedure. Requires the noisy-VCTK data folder. Configure with the appropriate stage-specific hyperparameter file.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/Voicebank/MTL/ASR_enhance/train.py
Signature
class MTLbrain(sb.Brain):
def compute_forward(self, batch, stage):
...
def compute_objectives(self, predictions, batch, stage):
...
class CompositeStats(sb.utils.metric_stats.MetricStats):
def summarize(self, field=None):
...
Import
python recipes/Voicebank/MTL/ASR_enhance/train.py hparams/robust_asr.yaml --data_folder /path/to/noisy-vctk
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| batch | PaddedBatch | Yes | Batch containing clean_sig, noisy_sig (waveforms), and ASR tokens |
| stage | sb.Stage | Yes | TRAIN, VALID, or TEST |
Outputs
| Name | Type | Description |
|---|---|---|
| predictions | dict | Enhanced waveforms, ASR log-probabilities, and feature-level predictions |
| loss | torch.Tensor | Combined enhancement and ASR loss (configurable per training stage) |
Usage Examples
# Three-stage training procedure:
python train.py hparams/pretrain_perceptual.yaml --data_folder /path/to/noisy-vctk
python train.py hparams/enhance_mimic.yaml --data_folder /path/to/noisy-vctk
python train.py hparams/robust_asr.yaml --data_folder /path/to/noisy-vctk