Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Speechbrain Speechbrain Load Hyperpyyaml SepFormer

From Leeroopedia


Field Value
Implementation Name Load_Hyperpyyaml_SepFormer
API load_hyperpyyaml(yaml_stream, overrides=None)
Source External hyperpyyaml. YAML: recipes/LibriMix/separation/hparams/sepformer-libri2mix.yaml:L1-198
Import from hyperpyyaml import load_hyperpyyaml
Type Wrapper Doc
Related Principle Principle:Speechbrain_Speechbrain_SepFormer_Model_Configuration

Purpose

This page documents the context-specific use of load_hyperpyyaml for loading the SepFormer speech separation model configuration. The YAML file sepformer-libri2mix.yaml defines all model architecture components, training hyperparameters, optimizer settings, and data paths required to train a SepFormer model on the Libri2Mix dataset.

Function Signature

def load_hyperpyyaml(yaml_stream, overrides=None):

Parameters

Parameter Type Description
yaml_stream file-like An open file handle to the YAML configuration file
overrides str or None Optional YAML-formatted string of parameter overrides

Outputs

Returns a Python dictionary (hparams) where:

  • Scalar values are loaded as their native Python types
  • !new: tags instantiate Python objects (model components, schedulers, etc.)
  • !ref tags resolve cross-references within the YAML
  • !name: tags create callable factories (for optimizer and loss)

Key Parameters Configured

Model Architecture

Parameter Value Description
N_encoder_out 256 Encoder output channels (latent dimension)
kernel_size 16 Encoder/Decoder convolutional kernel size
kernel_stride 8 Decoder stride
out_channels 256 Transformer model dimension (d_model)
d_ffn 1024 Feed-forward network dimension in transformer blocks
num_spks 2 Number of speakers to separate

Training Hyperparameters

Parameter Value Description
N_epochs 200 Total training epochs
batch_size 1 Batch size per GPU
lr 0.00015 Initial learning rate
clip_grad_norm 5 Maximum gradient norm for clipping
precision fp16 Mixed precision training mode
threshold_byloss True Enable loss-based sample filtering
threshold -30 SI-SNR threshold for filtering (dB)
loss_upper_lim 999999 Upper limit for acceptable loss

Data Augmentation

Parameter Value Description
use_speedperturb True Enable speed perturbation augmentation
use_wavedrop False Disable wave dropping
use_rand_shift False Disable random shifting
dynamic_mixing False Dynamic mixing disabled by default
use_wham_noise False WHAM! noise disabled by default

Components Instantiated

Encoder

Encoder: !new:speechbrain.lobes.models.dual_path.Encoder
    kernel_size: !ref <kernel_size>
    out_channels: !ref <N_encoder_out>

A 1D convolutional layer that transforms raw waveform input into a latent representation of dimension 256.

Intra-chunk Transformer Block

SBtfintra: !new:speechbrain.lobes.models.dual_path.SBTransformerBlock
    num_layers: 8
    d_model: !ref <out_channels>
    nhead: 8
    d_ffn: !ref <d_ffn>
    dropout: 0
    use_positional_encoding: True
    norm_before: True

Processes frames within each chunk using 8 layers of multi-head self-attention with 8 heads, pre-layer normalization, and sinusoidal positional encoding.

Inter-chunk Transformer Block

SBtfinter: !new:speechbrain.lobes.models.dual_path.SBTransformerBlock
    num_layers: 8
    d_model: !ref <out_channels>
    nhead: 8
    d_ffn: !ref <d_ffn>
    dropout: 0
    use_positional_encoding: True
    norm_before: True

Processes frames across chunks at the same temporal position, with the same architecture as the intra-chunk block.

Dual Path Model (MaskNet)

MaskNet: !new:speechbrain.lobes.models.dual_path.Dual_Path_Model
    num_spks: !ref <num_spks>
    in_channels: !ref <N_encoder_out>
    out_channels: !ref <out_channels>
    num_layers: 2
    K: 250
    intra_model: !ref <SBtfintra>
    inter_model: !ref <SBtfinter>
    norm: ln
    linear_layer_after_inter_intra: False
    skip_around_intra: True

The core separation module: 2 dual-path layers with chunk size K=250, layer normalization, and skip connections around the intra-chunk processing.

Decoder

Decoder: !new:speechbrain.lobes.models.dual_path.Decoder
    in_channels: !ref <N_encoder_out>
    out_channels: 1
    kernel_size: !ref <kernel_size>
    stride: !ref <kernel_stride>
    bias: False

A transposed 1D convolution that reconstructs waveforms from the masked latent representations.

Optimizer and Loss

optimizer: !name:torch.optim.Adam
    lr: !ref <lr>
    weight_decay: 0

loss: !name:speechbrain.nnet.losses.get_si_snr_with_pitwrapper

lr_scheduler: !new:speechbrain.nnet.schedulers.ReduceLROnPlateau
    factor: 0.5
    patience: 2
    dont_halve_until_epoch: 5

Usage Example

import sys
import speechbrain as sb
from hyperpyyaml import load_hyperpyyaml

# Parse command-line arguments
hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])

# Load the YAML configuration
with open(hparams_file, encoding="utf-8") as fin:
    hparams = load_hyperpyyaml(fin, overrides)

# Access instantiated components
encoder = hparams["Encoder"]       # Encoder nn.Module instance
mask_net = hparams["MaskNet"]      # Dual_Path_Model instance
decoder = hparams["Decoder"]       # Decoder nn.Module instance
loss_fn = hparams["loss"]          # get_si_snr_with_pitwrapper callable
optimizer_cls = hparams["optimizer"]  # Adam factory callable

Command-Line Invocation

python train.py hparams/sepformer-libri2mix.yaml --data_folder /data/Libri2Mix

Overriding Parameters

overrides = "lr: 0.001\nbatch_size: 2\nN_epochs: 100"
with open("hparams/sepformer-libri2mix.yaml", encoding="utf-8") as fin:
    hparams = load_hyperpyyaml(fin, overrides)

Key Implementation Details

  • The !ref syntax enables parameter cross-referencing (e.g., !ref <kernel_size>), ensuring consistency across components
  • The !new: tag instantiates objects at YAML load time, so the returned dictionary contains live PyTorch modules
  • The !name: tag creates factory callables (not instances), used for the optimizer and loss function
  • The modules dictionary groups all trainable components for the Brain class
  • The checkpointer configuration defines which components are saved and recovered during training
  • Pretrained model loading is supported via the pretrained_separator Pretrainer configuration

Source File

recipes/LibriMix/separation/hparams/sepformer-libri2mix.yaml

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment