Implementation:Speechbrain Speechbrain Load Hyperpyyaml SepFormer
| Field | Value |
|---|---|
| Implementation Name | Load_Hyperpyyaml_SepFormer |
| API | load_hyperpyyaml(yaml_stream, overrides=None)
|
| Source | External hyperpyyaml. YAML: recipes/LibriMix/separation/hparams/sepformer-libri2mix.yaml:L1-198
|
| Import | from hyperpyyaml import load_hyperpyyaml
|
| Type | Wrapper Doc |
| Related Principle | Principle:Speechbrain_Speechbrain_SepFormer_Model_Configuration |
Purpose
This page documents the context-specific use of load_hyperpyyaml for loading the SepFormer speech separation model configuration. The YAML file sepformer-libri2mix.yaml defines all model architecture components, training hyperparameters, optimizer settings, and data paths required to train a SepFormer model on the Libri2Mix dataset.
Function Signature
def load_hyperpyyaml(yaml_stream, overrides=None):
Parameters
| Parameter | Type | Description |
|---|---|---|
yaml_stream |
file-like | An open file handle to the YAML configuration file |
overrides |
str or None | Optional YAML-formatted string of parameter overrides |
Outputs
Returns a Python dictionary (hparams) where:
- Scalar values are loaded as their native Python types
!new:tags instantiate Python objects (model components, schedulers, etc.)!reftags resolve cross-references within the YAML!name:tags create callable factories (for optimizer and loss)
Key Parameters Configured
Model Architecture
| Parameter | Value | Description |
|---|---|---|
N_encoder_out |
256 | Encoder output channels (latent dimension) |
kernel_size |
16 | Encoder/Decoder convolutional kernel size |
kernel_stride |
8 | Decoder stride |
out_channels |
256 | Transformer model dimension (d_model) |
d_ffn |
1024 | Feed-forward network dimension in transformer blocks |
num_spks |
2 | Number of speakers to separate |
Training Hyperparameters
| Parameter | Value | Description |
|---|---|---|
N_epochs |
200 | Total training epochs |
batch_size |
1 | Batch size per GPU |
lr |
0.00015 | Initial learning rate |
clip_grad_norm |
5 | Maximum gradient norm for clipping |
precision |
fp16 | Mixed precision training mode |
threshold_byloss |
True | Enable loss-based sample filtering |
threshold |
-30 | SI-SNR threshold for filtering (dB) |
loss_upper_lim |
999999 | Upper limit for acceptable loss |
Data Augmentation
| Parameter | Value | Description |
|---|---|---|
use_speedperturb |
True | Enable speed perturbation augmentation |
use_wavedrop |
False | Disable wave dropping |
use_rand_shift |
False | Disable random shifting |
dynamic_mixing |
False | Dynamic mixing disabled by default |
use_wham_noise |
False | WHAM! noise disabled by default |
Components Instantiated
Encoder
Encoder: !new:speechbrain.lobes.models.dual_path.Encoder
kernel_size: !ref <kernel_size>
out_channels: !ref <N_encoder_out>
A 1D convolutional layer that transforms raw waveform input into a latent representation of dimension 256.
Intra-chunk Transformer Block
SBtfintra: !new:speechbrain.lobes.models.dual_path.SBTransformerBlock
num_layers: 8
d_model: !ref <out_channels>
nhead: 8
d_ffn: !ref <d_ffn>
dropout: 0
use_positional_encoding: True
norm_before: True
Processes frames within each chunk using 8 layers of multi-head self-attention with 8 heads, pre-layer normalization, and sinusoidal positional encoding.
Inter-chunk Transformer Block
SBtfinter: !new:speechbrain.lobes.models.dual_path.SBTransformerBlock
num_layers: 8
d_model: !ref <out_channels>
nhead: 8
d_ffn: !ref <d_ffn>
dropout: 0
use_positional_encoding: True
norm_before: True
Processes frames across chunks at the same temporal position, with the same architecture as the intra-chunk block.
Dual Path Model (MaskNet)
MaskNet: !new:speechbrain.lobes.models.dual_path.Dual_Path_Model
num_spks: !ref <num_spks>
in_channels: !ref <N_encoder_out>
out_channels: !ref <out_channels>
num_layers: 2
K: 250
intra_model: !ref <SBtfintra>
inter_model: !ref <SBtfinter>
norm: ln
linear_layer_after_inter_intra: False
skip_around_intra: True
The core separation module: 2 dual-path layers with chunk size K=250, layer normalization, and skip connections around the intra-chunk processing.
Decoder
Decoder: !new:speechbrain.lobes.models.dual_path.Decoder
in_channels: !ref <N_encoder_out>
out_channels: 1
kernel_size: !ref <kernel_size>
stride: !ref <kernel_stride>
bias: False
A transposed 1D convolution that reconstructs waveforms from the masked latent representations.
Optimizer and Loss
optimizer: !name:torch.optim.Adam
lr: !ref <lr>
weight_decay: 0
loss: !name:speechbrain.nnet.losses.get_si_snr_with_pitwrapper
lr_scheduler: !new:speechbrain.nnet.schedulers.ReduceLROnPlateau
factor: 0.5
patience: 2
dont_halve_until_epoch: 5
Usage Example
import sys
import speechbrain as sb
from hyperpyyaml import load_hyperpyyaml
# Parse command-line arguments
hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])
# Load the YAML configuration
with open(hparams_file, encoding="utf-8") as fin:
hparams = load_hyperpyyaml(fin, overrides)
# Access instantiated components
encoder = hparams["Encoder"] # Encoder nn.Module instance
mask_net = hparams["MaskNet"] # Dual_Path_Model instance
decoder = hparams["Decoder"] # Decoder nn.Module instance
loss_fn = hparams["loss"] # get_si_snr_with_pitwrapper callable
optimizer_cls = hparams["optimizer"] # Adam factory callable
Command-Line Invocation
python train.py hparams/sepformer-libri2mix.yaml --data_folder /data/Libri2Mix
Overriding Parameters
overrides = "lr: 0.001\nbatch_size: 2\nN_epochs: 100"
with open("hparams/sepformer-libri2mix.yaml", encoding="utf-8") as fin:
hparams = load_hyperpyyaml(fin, overrides)
Key Implementation Details
- The
!refsyntax enables parameter cross-referencing (e.g.,!ref <kernel_size>), ensuring consistency across components - The
!new:tag instantiates objects at YAML load time, so the returned dictionary contains live PyTorch modules - The
!name:tag creates factory callables (not instances), used for the optimizer and loss function - The
modulesdictionary groups all trainable components for theBrainclass - The
checkpointerconfiguration defines which components are saved and recovered during training - Pretrained model loading is supported via the
pretrained_separatorPretrainer configuration
Source File
recipes/LibriMix/separation/hparams/sepformer-libri2mix.yaml