Implementation:Speechbrain Speechbrain Train Voicebank MTL

Knowledge Sources	SpeechBrain
Domains	Multi_Task_Learning, Speech_Enhancement
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for multi-task learning combining speech enhancement and ASR on the Voicebank dataset provided by the SpeechBrain library.

Description

This recipe defines the MTLbrain class (subclass of sb.Brain) for multi-task learning using both seq2seq ASR and speech enhancement objectives on the Voicebank-DEMAND dataset. The training proceeds in three stages: perceptual pretraining (pretrain_perceptual.yaml), enhancement with mimic loss (enhance_mimic.yaml), and robust ASR training (robust_asr.yaml). Different losses can be toggled on and off, and pre-trained models can be loaded for either component. Evaluation includes PESQ, eSTOI, and composite metrics (CSIG, CBAK, COVL) via a custom CompositeStats metric class.

Usage

Use this recipe to train a multi-task ASR and enhancement system on the Voicebank-DEMAND (noisy-VCTK) dataset. Follows a three-stage training procedure. Requires the noisy-VCTK data folder. Configure with the appropriate stage-specific hyperparameter file.

Code Reference

Source Location

Repository: SpeechBrain
File: recipes/Voicebank/MTL/ASR_enhance/train.py

Signature

class MTLbrain(sb.Brain):
    def compute_forward(self, batch, stage):
        ...
    def compute_objectives(self, predictions, batch, stage):
        ...

class CompositeStats(sb.utils.metric_stats.MetricStats):
    def summarize(self, field=None):
        ...

Import

python recipes/Voicebank/MTL/ASR_enhance/train.py hparams/robust_asr.yaml --data_folder /path/to/noisy-vctk

I/O Contract

Inputs

Name	Type	Required	Description
batch	PaddedBatch	Yes	Batch containing clean_sig, noisy_sig (waveforms), and ASR tokens
stage	sb.Stage	Yes	TRAIN, VALID, or TEST

Outputs

Name	Type	Description
predictions	dict	Enhanced waveforms, ASR log-probabilities, and feature-level predictions
loss	torch.Tensor	Combined enhancement and ASR loss (configurable per training stage)

Usage Examples

# Three-stage training procedure:
python train.py hparams/pretrain_perceptual.yaml --data_folder /path/to/noisy-vctk
python train.py hparams/enhance_mimic.yaml --data_folder /path/to/noisy-vctk
python train.py hparams/robust_asr.yaml --data_folder /path/to/noisy-vctk

Related Pages

Principle:Speechbrain_Speechbrain_Conventional_Enhancement_Training

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment