Implementation:Speechbrain Speechbrain Train TimersAndSuch Direct
| Knowledge Sources | |
|---|---|
| Domains | SLU, Training |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for direct (speech to semantics) SLU training with ASR-based transfer learning on the Timers-and-Such dataset provided by the SpeechBrain library.
Description
This recipe defines the SLU class (subclass of sb.Brain) for direct spoken language understanding on the Timers-and-Such dataset. Input waveforms are encoded using a pre-trained ASR model (frozen during training), then passed through an SLU encoder and seq2seq decoder to predict semantic token sequences. Supports waveform augmentation with label replication for augmented samples, beam search decoding, and semantic accuracy evaluation. The architecture maps directly from speech to semantics without an intermediate text representation.
Usage
Use this recipe to train a direct SLU model on the Timers-and-Such dataset using ASR-based transfer learning. Requires the dataset and a pre-trained ASR model checkpoint. Configure with hparams/train.yaml.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/timers-and-such/direct/train.py
Signature
class SLU(sb.Brain):
def compute_forward(self, batch, stage):
...
def compute_objectives(self, predictions, batch, stage):
...
Import
python recipes/timers-and-such/direct/train.py hparams/train.yaml --data_folder /path/to/timers-and-such
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| batch | PaddedBatch | Yes | Batch containing sig (waveforms), tokens_bos, and tokens_eos (semantic tokens) |
| stage | sb.Stage | Yes | TRAIN, VALID, or TEST |
Outputs
| Name | Type | Description |
|---|---|---|
| predictions | tuple | Log-softmax sequence probabilities, wav_lens, and decoded semantic tokens |
| loss | torch.Tensor | Sequence-level NLL loss on semantic tokens |
Usage Examples
python train.py hparams/train.yaml --data_folder /path/to/timers-and-such