Implementation:Speechbrain Speechbrain Train TimersAndSuch Multistage

Knowledge Sources	SpeechBrain
Domains	Spoken_Language_Understanding, Training
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for multistage spoken language understanding (SLU) training provided by the SpeechBrain library.

Description

This recipe implements a "multistage" SLU pipeline: speech is first transcribed to text using a pretrained ASR model (trained on LibriSpeech), then the transcriptions are fed into a sequence-to-sequence model that maps them to semantic representations. The SLU class extends sb.Brain and performs both the ASR forward pass and the NLU (Natural Language Understanding) forward pass within compute_forward. The ASR model produces word-level transcriptions which are tokenized, embedded, and passed through an SLU encoder and decoder with beam search at inference time. Training uses negative log-likelihood loss on semantic token sequences. The benefit of online transcription (rather than offline) is the ability to use augmentation and sample multiple possible transcriptions during training.

Evaluation metrics include CER (Character Error Rate), WER (Word Error Rate), and SER (Sentence Error Rate) on semantic output sequences.

Usage

Run as a training recipe with a YAML hyperparameter file. The script handles data preparation from CSV files, model training with learning rate annealing based on SER, and checkpointing.

Code Reference

Source Location

Repository: SpeechBrain
File: recipes/timers-and-such/multistage/train.py

Signature

class SLU(sb.Brain):
    def compute_forward(self, batch, stage):
        """Forward computations from waveform batches to output probabilities."""
        ...

    def compute_objectives(self, predictions, batch, stage):
        """Computes the loss (NLL) given predictions and targets."""
        ...

    def on_stage_start(self, stage, epoch):
        """Gets called at the beginning of each epoch."""
        ...

    def on_stage_end(self, stage, stage_loss, epoch):
        """Gets called at the end of an epoch."""
        ...

def dataio_prepare(hparams):
    """Prepares the datasets to be used in the brain class."""
    ...

Import

python train.py hparams/train.yaml

I/O Contract

Inputs

Name	Type	Required	Description
hparams_file	str	Yes	Path to YAML hyperparameter file
batch.sig	tuple(torch.Tensor, torch.Tensor)	Yes	Waveform tensor and lengths
batch.tokens_bos	tuple(torch.Tensor, torch.Tensor)	Yes	Target semantic tokens with BOS and their lengths
batch.tokens_eos	tuple(torch.Tensor, torch.Tensor)	Yes	Target semantic tokens with EOS and their lengths
batch.semantics	list[str]	Yes	Target semantic strings for evaluation
asr_model	Pretrained	Yes	Pretrained ASR model for transcription (e.g., LibriSpeech-trained)

Outputs

Name	Type	Description
p_seq	torch.Tensor	Log-probabilities over semantic token sequences
p_tokens	torch.Tensor	Beam search decoded token predictions (at inference)
CER	float	Character Error Rate on semantic output
WER	float	Word Error Rate on semantic output
SER	float	Sentence Error Rate on semantic output

Usage Examples

# Train the multistage SLU model
python train.py hparams/train.yaml --data_folder /path/to/timers-and-such

# The pipeline: speech -> ASR transcription -> NLU -> semantic parse
# Example output semantic format: "action: set | object: timer | duration: 5 minutes"

Related Pages

Principle:Speechbrain_Speechbrain_Spoken_Language_Understanding

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment