Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Speechbrain Speechbrain Train TimersAndSuch Multistage

From Leeroopedia


Knowledge Sources
Domains Spoken_Language_Understanding, Training
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for multistage spoken language understanding (SLU) training provided by the SpeechBrain library.

Description

This recipe implements a "multistage" SLU pipeline: speech is first transcribed to text using a pretrained ASR model (trained on LibriSpeech), then the transcriptions are fed into a sequence-to-sequence model that maps them to semantic representations. The SLU class extends sb.Brain and performs both the ASR forward pass and the NLU (Natural Language Understanding) forward pass within compute_forward. The ASR model produces word-level transcriptions which are tokenized, embedded, and passed through an SLU encoder and decoder with beam search at inference time. Training uses negative log-likelihood loss on semantic token sequences. The benefit of online transcription (rather than offline) is the ability to use augmentation and sample multiple possible transcriptions during training.

Evaluation metrics include CER (Character Error Rate), WER (Word Error Rate), and SER (Sentence Error Rate) on semantic output sequences.

Usage

Run as a training recipe with a YAML hyperparameter file. The script handles data preparation from CSV files, model training with learning rate annealing based on SER, and checkpointing.

Code Reference

Source Location

Signature

class SLU(sb.Brain):
    def compute_forward(self, batch, stage):
        """Forward computations from waveform batches to output probabilities."""
        ...

    def compute_objectives(self, predictions, batch, stage):
        """Computes the loss (NLL) given predictions and targets."""
        ...

    def on_stage_start(self, stage, epoch):
        """Gets called at the beginning of each epoch."""
        ...

    def on_stage_end(self, stage, stage_loss, epoch):
        """Gets called at the end of an epoch."""
        ...

def dataio_prepare(hparams):
    """Prepares the datasets to be used in the brain class."""
    ...

Import

python train.py hparams/train.yaml

I/O Contract

Inputs

Name Type Required Description
hparams_file str Yes Path to YAML hyperparameter file
batch.sig tuple(torch.Tensor, torch.Tensor) Yes Waveform tensor and lengths
batch.tokens_bos tuple(torch.Tensor, torch.Tensor) Yes Target semantic tokens with BOS and their lengths
batch.tokens_eos tuple(torch.Tensor, torch.Tensor) Yes Target semantic tokens with EOS and their lengths
batch.semantics list[str] Yes Target semantic strings for evaluation
asr_model Pretrained Yes Pretrained ASR model for transcription (e.g., LibriSpeech-trained)

Outputs

Name Type Description
p_seq torch.Tensor Log-probabilities over semantic token sequences
p_tokens torch.Tensor Beam search decoded token predictions (at inference)
CER float Character Error Rate on semantic output
WER float Word Error Rate on semantic output
SER float Sentence Error Rate on semantic output

Usage Examples

# Train the multistage SLU model
python train.py hparams/train.yaml --data_folder /path/to/timers-and-such

# The pipeline: speech -> ASR transcription -> NLU -> semantic parse
# Example output semantic format: "action: set | object: timer | duration: 5 minutes"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment