Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Speechbrain Speechbrain Train Voicebank MTL

From Leeroopedia


Knowledge Sources
Domains Multi_Task_Learning, Speech_Enhancement
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for multi-task learning combining speech enhancement and ASR on the Voicebank dataset provided by the SpeechBrain library.

Description

This recipe defines the MTLbrain class (subclass of sb.Brain) for multi-task learning using both seq2seq ASR and speech enhancement objectives on the Voicebank-DEMAND dataset. The training proceeds in three stages: perceptual pretraining (pretrain_perceptual.yaml), enhancement with mimic loss (enhance_mimic.yaml), and robust ASR training (robust_asr.yaml). Different losses can be toggled on and off, and pre-trained models can be loaded for either component. Evaluation includes PESQ, eSTOI, and composite metrics (CSIG, CBAK, COVL) via a custom CompositeStats metric class.

Usage

Use this recipe to train a multi-task ASR and enhancement system on the Voicebank-DEMAND (noisy-VCTK) dataset. Follows a three-stage training procedure. Requires the noisy-VCTK data folder. Configure with the appropriate stage-specific hyperparameter file.

Code Reference

Source Location

Signature

class MTLbrain(sb.Brain):
    def compute_forward(self, batch, stage):
        ...
    def compute_objectives(self, predictions, batch, stage):
        ...

class CompositeStats(sb.utils.metric_stats.MetricStats):
    def summarize(self, field=None):
        ...

Import

python recipes/Voicebank/MTL/ASR_enhance/train.py hparams/robust_asr.yaml --data_folder /path/to/noisy-vctk

I/O Contract

Inputs

Name Type Required Description
batch PaddedBatch Yes Batch containing clean_sig, noisy_sig (waveforms), and ASR tokens
stage sb.Stage Yes TRAIN, VALID, or TEST

Outputs

Name Type Description
predictions dict Enhanced waveforms, ASR log-probabilities, and feature-level predictions
loss torch.Tensor Combined enhancement and ASR loss (configurable per training stage)

Usage Examples

# Three-stage training procedure:
python train.py hparams/pretrain_perceptual.yaml --data_folder /path/to/noisy-vctk
python train.py hparams/enhance_mimic.yaml --data_folder /path/to/noisy-vctk
python train.py hparams/robust_asr.yaml --data_folder /path/to/noisy-vctk

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment