Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Speechbrain Speechbrain Train IWSLT22 SAMU

From Leeroopedia


Knowledge Sources
Domains Speech_Translation, Training
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for SAMU (Semantically Aware Multi-lingual Utterance) pretraining on the IWSLT22 low-resource task provided by the SpeechBrain library.

Description

This recipe defines the ST class (subclass of sb.core.Brain) for fine-tuning a wav2vec2 model for semantically enriching speech representations as described in https://arxiv.org/abs/2205.08180. The model uses wav2vec2 for feature extraction, self-attention pooling for utterance-level embeddings, and L2-normalized cosine similarity loss against LaBSE text embeddings. This pretraining aligns speech and text representations in a shared semantic space. Supports separate optimizers for wav2vec2 and LaBSE with independent freezing schedules.

Usage

Use this recipe to pretrain SAMU embeddings that align speech representations with multilingual text embeddings from LaBSE. Requires speech-translation pairs from the IWSLT22 dataset. The resulting model can be used downstream for speech translation. Configure with train_samu.yaml.

Code Reference

Source Location

Signature

class ST(sb.core.Brain):
    def compute_forward(self, batch, stage):
        ...
    def compute_objectives(self, predictions, batch, stage):
        ...
    def init_optimizers(self):
        ...
    def freeze_optimizers(self, optimizers):
        ...

Import

python recipes/IWSLT22_lowresource/AST/transformer/train_samu.py hparams/train_samu.yaml --data_folder /path/to/data

I/O Contract

Inputs

Name Type Required Description
batch PaddedBatch Yes Batch containing sig (waveforms) and trans (text translations)
stage sb.Stage Yes TRAIN, VALID, or TEST

Outputs

Name Type Description
predictions tuple L2-normalized utterance embeddings and LaBSE text embeddings
loss torch.Tensor Cosine similarity loss between speech and text embeddings

Usage Examples

python train_samu.py hparams/train_samu.yaml --data_folder /path/to/data

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment