Implementation:Speechbrain Speechbrain Load Hyperpyyaml Enhancement

Property	Value
Implementation Name	Load_Hyperpyyaml_Enhancement
API	`load_hyperpyyaml(yaml_stream, overrides=None)`
Source File	External package: `hyperpyyaml`. Model YAML files: `recipes/Voicebank/enhance/MetricGAN/models/MetricGAN.yaml`, `recipes/Voicebank/enhance/spectral_mask/hparams/models/BLSTM.yaml`, `recipes/Voicebank/enhance/spectral_mask/hparams/models/2DFCN.yaml`
Import	`from hyperpyyaml import load_hyperpyyaml`
Type	Wrapper Doc (context: enhancement architectures)
Workflow	Speech_Enhancement_Training
Domains	Model_Architecture, Speech_Enhancement
Related Principle	Principle:Speechbrain_Speechbrain_Enhancement_Architecture_Selection

Purpose

The load_hyperpyyaml function parses HyperPyYAML configuration files and instantiates all defined Python objects, returning a dictionary of fully-constructed hyperparameters including neural network models, optimizers, loss functions, and feature extractors. In the context of speech enhancement, it is the mechanism by which a declarative YAML specification is transformed into a runnable training configuration with a specific enhancement architecture.

Function Signature

def load_hyperpyyaml(yaml_stream, overrides=None):
    """Load a HyperPyYAML file and return a dictionary of objects.

    Arguments
    ---------
    yaml_stream : file-like or str
        A file-like object or string containing the YAML content.
    overrides : str, optional
        A YAML-formatted string of overrides to apply.

    Returns
    -------
    dict
        A dictionary where keys are YAML keys and values are the
        instantiated Python objects or raw values.
    """

Architecture Configurations

BLSTM (Spectral Mask)

File: recipes/Voicebank/enhance/spectral_mask/hparams/models/BLSTM.yaml

model: !new:speechbrain.nnet.containers.Sequential
    input_shape: [null, null, !ref <N_fft> // 2 + 1]
    lstm: !name:speechbrain.nnet.RNN.LSTM
        hidden_size: 200
        num_layers: 2
        dropout: 0
        bidirectional: True
    linear1: !name:speechbrain.nnet.linear.Linear
        n_neurons: 300
        bias: True
    act1: !new:torch.nn.LeakyReLU
        negative_slope: 0.01
    linear2: !name:speechbrain.nnet.linear.Linear
        n_neurons: !ref <N_fft> // 2 + 1
        bias: True
    act2: !new:torch.nn.Sigmoid

Layer	Type	Output Shape	Notes
Input	Spectrogram	[B, T, 257]	N_fft=512, so freq_bins = 257
lstm	BLSTM	[B, T, 400]	200 hidden x 2 directions
linear1	Linear	[B, T, 300]	With LeakyReLU
linear2	Linear	[B, T, 257]	Sigmoid for mask [0, 1]

2D-FCN (Spectral Mask)

File: recipes/Voicebank/enhance/spectral_mask/hparams/models/2DFCN.yaml

kernel_size: (9,9)
base_channels: 64

model: !new:speechbrain.nnet.containers.Sequential
    input_shape: [null, null, !ref <N_fft> // 2 + 1]
    conv1: !name:speechbrain.nnet.CNN.Conv2d
        out_channels: !ref <base_channels>
        kernel_size: !ref <kernel_size>
    BN1: !new:speechbrain.nnet.normalization.BatchNorm2d
        input_size: !ref <base_channels>
    act1: !new:torch.nn.LeakyReLU
        negative_slope: 0.01
    # ... 6 more conv+BN+activation blocks ...
    conv8: !name:speechbrain.nnet.CNN.Conv2d
        padding: valid
        out_channels: 257
        kernel_size: (257,1)
    act8: !new:torch.nn.Sigmoid

Layer	Type	Key Parameters
conv1-conv7	Conv2d + BatchNorm2d + LeakyReLU	64 channels, (9,9) kernel, same padding
conv8	Conv2d	257 output channels, (257,1) kernel, valid padding
act8	Sigmoid	Final mask activation

MetricGAN (Generator + Discriminator)

File: recipes/Voicebank/enhance/MetricGAN/models/MetricGAN.yaml

kernel_size: (5,5)
base_channels: 15

generator: !new:speechbrain.lobes.models.MetricGAN.EnhancementGenerator

discriminator: !new:speechbrain.lobes.models.MetricGAN.MetricDiscriminator
    kernel_size: !ref <kernel_size>
    base_channels: !ref <base_channels>

The MetricGAN configuration is distinct from spectral mask models because it defines two models:

Generator: EnhancementGenerator -- predicts spectral mask to enhance noisy speech
Discriminator: MetricDiscriminator -- predicts perceptual quality scores from spectral pairs

Usage Examples

Loading a Spectral Mask Configuration

import sys
import speechbrain as sb
from hyperpyyaml import load_hyperpyyaml

# Parse command-line arguments
hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])

# Load YAML and instantiate all objects
with open(hparams_file, encoding="utf-8") as fin:
    hparams = load_hyperpyyaml(fin, overrides)

# Access the instantiated model
model = hparams["models"]["model"]  # The Sequential model from included YAML
print(model)  # Shows full architecture

# Access other instantiated objects
optimizer_class = hparams["opt_class"]  # torch.optim.Adam with lr=0.0001
loss_function = hparams["compute_cost"]  # speechbrain.nnet.losses.mse_loss
stft = hparams["compute_STFT"]           # STFT feature extractor

Switching Architectures via Command Line

# Train with BLSTM (default in train.yaml)
# python train.py hparams/train.yaml

# Train with 2D-FCN by overriding the model include
# python train.py hparams/train.yaml --models '!include:models/2DFCN.yaml'

# Override specific hyperparameters
# python train.py hparams/train.yaml --lr 0.001 --N_batch 16

Loading MetricGAN Configuration

from hyperpyyaml import load_hyperpyyaml

with open("recipes/Voicebank/enhance/MetricGAN/hparams/train.yaml") as fin:
    hparams = load_hyperpyyaml(fin)

# MetricGAN has separate generator and discriminator
generator = hparams["models"]["generator"]
discriminator = hparams["models"]["discriminator"]

# And separate optimizers
g_optimizer = hparams["g_opt_class"](generator.parameters())
d_optimizer = hparams["d_opt_class"](discriminator.parameters())

# Modules dict used by Brain class
modules = hparams["modules"]
# modules = {"generator": generator, "discriminator": discriminator}

Programmatic Override

from hyperpyyaml import load_hyperpyyaml

overrides = """
number_of_epochs: 100
lr: 0.0005
N_batch: 16
"""

with open("hparams/train.yaml") as fin:
    hparams = load_hyperpyyaml(fin, overrides=overrides)

Training YAML Structure

The main training YAML (e.g., recipes/Voicebank/enhance/spectral_mask/hparams/train.yaml) includes these key sections:

# FFT parameters shared with model
Sample_rate: 16000
N_fft: 512

# Include model architecture (swap this line to change architecture)
models: !include:models/BLSTM.yaml
    N_fft: !ref <N_fft>

# Modules dict passed to Brain class
modules:
    model: !ref <models[model]>

# Optimizer
opt_class: !name:torch.optim.Adam
    lr: !ref <lr>

# Loss function
compute_cost: !name:speechbrain.nnet.losses.mse_loss

# Feature extraction
compute_STFT: !new:speechbrain.processing.features.STFT
    sample_rate: !ref <Sample_rate>
    win_length: 32
    hop_length: 16
    n_fft: !ref <N_fft>
    window_fn: !name:torch.hamming_window

HyperPyYAML Tags Reference

Tag	Purpose	Example
`!new:`	Create a new instance of a class	`!new:torch.nn.Sigmoid`
`!name:`	Defer instantiation (used inside Sequential)	`!name:speechbrain.nnet.RNN.LSTM`
`!ref`	Reference another YAML value	`!ref <N_fft>`
`!include:`	Include another YAML file	`!include:models/BLSTM.yaml`
`!apply:`	Call a function at load time	`!apply:speechbrain.utils.seed_everything [!ref <seed>]`
`!PLACEHOLDER`	Required value that must be overridden	`data_folder: !PLACEHOLDER`

Notes and Edge Cases

Parameter propagation: When !include: is used, parameters can be passed to the included file. For example, N_fft is passed from the training YAML to the model YAML.
Lazy instantiation: !name: tags defer object creation, which is necessary when the Sequential container needs to infer input shapes before constructing layers.
Module registry: The modules dictionary in the YAML directly maps to the modules argument of sb.Brain, making all model components accessible via self.modules in the Brain class.
Checkpoint compatibility: The checkpointer YAML section references the same model objects, ensuring that checkpoints correctly save and restore model state.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment