Implementation:Speechbrain Speechbrain Load Hyperpyyaml SepFormer

Field	Value
Implementation Name	Load_Hyperpyyaml_SepFormer
API	`load_hyperpyyaml(yaml_stream, overrides=None)`
Source	External `hyperpyyaml`. YAML: `recipes/LibriMix/separation/hparams/sepformer-libri2mix.yaml:L1-198`
Import	`from hyperpyyaml import load_hyperpyyaml`
Type	Wrapper Doc
Related Principle	Principle:Speechbrain_Speechbrain_SepFormer_Model_Configuration

Purpose

This page documents the context-specific use of load_hyperpyyaml for loading the SepFormer speech separation model configuration. The YAML file sepformer-libri2mix.yaml defines all model architecture components, training hyperparameters, optimizer settings, and data paths required to train a SepFormer model on the Libri2Mix dataset.

Function Signature

def load_hyperpyyaml(yaml_stream, overrides=None):

Parameters

Parameter	Type	Description
`yaml_stream`	file-like	An open file handle to the YAML configuration file
`overrides`	str or None	Optional YAML-formatted string of parameter overrides

Outputs

Returns a Python dictionary (hparams) where:

Scalar values are loaded as their native Python types
!new: tags instantiate Python objects (model components, schedulers, etc.)
!ref tags resolve cross-references within the YAML
!name: tags create callable factories (for optimizer and loss)

Key Parameters Configured

Model Architecture

Parameter	Value	Description
`N_encoder_out`	256	Encoder output channels (latent dimension)
`kernel_size`	16	Encoder/Decoder convolutional kernel size
`kernel_stride`	8	Decoder stride
`out_channels`	256	Transformer model dimension (d_model)
`d_ffn`	1024	Feed-forward network dimension in transformer blocks
`num_spks`	2	Number of speakers to separate

Training Hyperparameters

Parameter	Value	Description
`N_epochs`	200	Total training epochs
`batch_size`	1	Batch size per GPU
`lr`	0.00015	Initial learning rate
`clip_grad_norm`	5	Maximum gradient norm for clipping
`precision`	fp16	Mixed precision training mode
`threshold_byloss`	True	Enable loss-based sample filtering
`threshold`	-30	SI-SNR threshold for filtering (dB)
`loss_upper_lim`	999999	Upper limit for acceptable loss

Data Augmentation

Parameter	Value	Description
`use_speedperturb`	True	Enable speed perturbation augmentation
`use_wavedrop`	False	Disable wave dropping
`use_rand_shift`	False	Disable random shifting
`dynamic_mixing`	False	Dynamic mixing disabled by default
`use_wham_noise`	False	WHAM! noise disabled by default

Components Instantiated

Encoder

Encoder: !new:speechbrain.lobes.models.dual_path.Encoder
    kernel_size: !ref <kernel_size>
    out_channels: !ref <N_encoder_out>

A 1D convolutional layer that transforms raw waveform input into a latent representation of dimension 256.

Intra-chunk Transformer Block

SBtfintra: !new:speechbrain.lobes.models.dual_path.SBTransformerBlock
    num_layers: 8
    d_model: !ref <out_channels>
    nhead: 8
    d_ffn: !ref <d_ffn>
    dropout: 0
    use_positional_encoding: True
    norm_before: True

Processes frames within each chunk using 8 layers of multi-head self-attention with 8 heads, pre-layer normalization, and sinusoidal positional encoding.

Inter-chunk Transformer Block

SBtfinter: !new:speechbrain.lobes.models.dual_path.SBTransformerBlock
    num_layers: 8
    d_model: !ref <out_channels>
    nhead: 8
    d_ffn: !ref <d_ffn>
    dropout: 0
    use_positional_encoding: True
    norm_before: True

Processes frames across chunks at the same temporal position, with the same architecture as the intra-chunk block.

Dual Path Model (MaskNet)

MaskNet: !new:speechbrain.lobes.models.dual_path.Dual_Path_Model
    num_spks: !ref <num_spks>
    in_channels: !ref <N_encoder_out>
    out_channels: !ref <out_channels>
    num_layers: 2
    K: 250
    intra_model: !ref <SBtfintra>
    inter_model: !ref <SBtfinter>
    norm: ln
    linear_layer_after_inter_intra: False
    skip_around_intra: True

The core separation module: 2 dual-path layers with chunk size K=250, layer normalization, and skip connections around the intra-chunk processing.

Decoder

Decoder: !new:speechbrain.lobes.models.dual_path.Decoder
    in_channels: !ref <N_encoder_out>
    out_channels: 1
    kernel_size: !ref <kernel_size>
    stride: !ref <kernel_stride>
    bias: False

A transposed 1D convolution that reconstructs waveforms from the masked latent representations.

Optimizer and Loss

optimizer: !name:torch.optim.Adam
    lr: !ref <lr>
    weight_decay: 0

loss: !name:speechbrain.nnet.losses.get_si_snr_with_pitwrapper

lr_scheduler: !new:speechbrain.nnet.schedulers.ReduceLROnPlateau
    factor: 0.5
    patience: 2
    dont_halve_until_epoch: 5

Usage Example

import sys
import speechbrain as sb
from hyperpyyaml import load_hyperpyyaml

# Parse command-line arguments
hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])

# Load the YAML configuration
with open(hparams_file, encoding="utf-8") as fin:
    hparams = load_hyperpyyaml(fin, overrides)

# Access instantiated components
encoder = hparams["Encoder"]       # Encoder nn.Module instance
mask_net = hparams["MaskNet"]      # Dual_Path_Model instance
decoder = hparams["Decoder"]       # Decoder nn.Module instance
loss_fn = hparams["loss"]          # get_si_snr_with_pitwrapper callable
optimizer_cls = hparams["optimizer"]  # Adam factory callable

Command-Line Invocation

python train.py hparams/sepformer-libri2mix.yaml --data_folder /data/Libri2Mix

Overriding Parameters

overrides = "lr: 0.001\nbatch_size: 2\nN_epochs: 100"
with open("hparams/sepformer-libri2mix.yaml", encoding="utf-8") as fin:
    hparams = load_hyperpyyaml(fin, overrides)

Key Implementation Details

The !ref syntax enables parameter cross-referencing (e.g., !ref <kernel_size>), ensuring consistency across components
The !new: tag instantiates objects at YAML load time, so the returned dictionary contains live PyTorch modules
The !name: tag creates factory callables (not instances), used for the optimizer and loss function
The modules dictionary groups all trainable components for the Brain class
The checkpointer configuration defines which components are saved and recovered during training
Pretrained model loading is supported via the pretrained_separator Pretrainer configuration

Source File

recipes/LibriMix/separation/hparams/sepformer-libri2mix.yaml

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment