Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Speechbrain Speechbrain Load Hyperpyyaml Enhancement

From Leeroopedia


Property Value
Implementation Name Load_Hyperpyyaml_Enhancement
API load_hyperpyyaml(yaml_stream, overrides=None)
Source File External package: hyperpyyaml. Model YAML files: recipes/Voicebank/enhance/MetricGAN/models/MetricGAN.yaml, recipes/Voicebank/enhance/spectral_mask/hparams/models/BLSTM.yaml, recipes/Voicebank/enhance/spectral_mask/hparams/models/2DFCN.yaml
Import from hyperpyyaml import load_hyperpyyaml
Type Wrapper Doc (context: enhancement architectures)
Workflow Speech_Enhancement_Training
Domains Model_Architecture, Speech_Enhancement
Related Principle Principle:Speechbrain_Speechbrain_Enhancement_Architecture_Selection

Purpose

The load_hyperpyyaml function parses HyperPyYAML configuration files and instantiates all defined Python objects, returning a dictionary of fully-constructed hyperparameters including neural network models, optimizers, loss functions, and feature extractors. In the context of speech enhancement, it is the mechanism by which a declarative YAML specification is transformed into a runnable training configuration with a specific enhancement architecture.

Function Signature

def load_hyperpyyaml(yaml_stream, overrides=None):
    """Load a HyperPyYAML file and return a dictionary of objects.

    Arguments
    ---------
    yaml_stream : file-like or str
        A file-like object or string containing the YAML content.
    overrides : str, optional
        A YAML-formatted string of overrides to apply.

    Returns
    -------
    dict
        A dictionary where keys are YAML keys and values are the
        instantiated Python objects or raw values.
    """

Architecture Configurations

BLSTM (Spectral Mask)

File: recipes/Voicebank/enhance/spectral_mask/hparams/models/BLSTM.yaml

model: !new:speechbrain.nnet.containers.Sequential
    input_shape: [null, null, !ref <N_fft> // 2 + 1]
    lstm: !name:speechbrain.nnet.RNN.LSTM
        hidden_size: 200
        num_layers: 2
        dropout: 0
        bidirectional: True
    linear1: !name:speechbrain.nnet.linear.Linear
        n_neurons: 300
        bias: True
    act1: !new:torch.nn.LeakyReLU
        negative_slope: 0.01
    linear2: !name:speechbrain.nnet.linear.Linear
        n_neurons: !ref <N_fft> // 2 + 1
        bias: True
    act2: !new:torch.nn.Sigmoid
Layer Type Output Shape Notes
Input Spectrogram [B, T, 257] N_fft=512, so freq_bins = 257
lstm BLSTM [B, T, 400] 200 hidden x 2 directions
linear1 Linear [B, T, 300] With LeakyReLU
linear2 Linear [B, T, 257] Sigmoid for mask [0, 1]

2D-FCN (Spectral Mask)

File: recipes/Voicebank/enhance/spectral_mask/hparams/models/2DFCN.yaml

kernel_size: (9,9)
base_channels: 64

model: !new:speechbrain.nnet.containers.Sequential
    input_shape: [null, null, !ref <N_fft> // 2 + 1]
    conv1: !name:speechbrain.nnet.CNN.Conv2d
        out_channels: !ref <base_channels>
        kernel_size: !ref <kernel_size>
    BN1: !new:speechbrain.nnet.normalization.BatchNorm2d
        input_size: !ref <base_channels>
    act1: !new:torch.nn.LeakyReLU
        negative_slope: 0.01
    # ... 6 more conv+BN+activation blocks ...
    conv8: !name:speechbrain.nnet.CNN.Conv2d
        padding: valid
        out_channels: 257
        kernel_size: (257,1)
    act8: !new:torch.nn.Sigmoid
Layer Type Key Parameters
conv1-conv7 Conv2d + BatchNorm2d + LeakyReLU 64 channels, (9,9) kernel, same padding
conv8 Conv2d 257 output channels, (257,1) kernel, valid padding
act8 Sigmoid Final mask activation

MetricGAN (Generator + Discriminator)

File: recipes/Voicebank/enhance/MetricGAN/models/MetricGAN.yaml

kernel_size: (5,5)
base_channels: 15

generator: !new:speechbrain.lobes.models.MetricGAN.EnhancementGenerator

discriminator: !new:speechbrain.lobes.models.MetricGAN.MetricDiscriminator
    kernel_size: !ref <kernel_size>
    base_channels: !ref <base_channels>

The MetricGAN configuration is distinct from spectral mask models because it defines two models:

  • Generator: EnhancementGenerator -- predicts spectral mask to enhance noisy speech
  • Discriminator: MetricDiscriminator -- predicts perceptual quality scores from spectral pairs

Usage Examples

Loading a Spectral Mask Configuration

import sys
import speechbrain as sb
from hyperpyyaml import load_hyperpyyaml

# Parse command-line arguments
hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])

# Load YAML and instantiate all objects
with open(hparams_file, encoding="utf-8") as fin:
    hparams = load_hyperpyyaml(fin, overrides)

# Access the instantiated model
model = hparams["models"]["model"]  # The Sequential model from included YAML
print(model)  # Shows full architecture

# Access other instantiated objects
optimizer_class = hparams["opt_class"]  # torch.optim.Adam with lr=0.0001
loss_function = hparams["compute_cost"]  # speechbrain.nnet.losses.mse_loss
stft = hparams["compute_STFT"]           # STFT feature extractor

Switching Architectures via Command Line

# Train with BLSTM (default in train.yaml)
# python train.py hparams/train.yaml

# Train with 2D-FCN by overriding the model include
# python train.py hparams/train.yaml --models '!include:models/2DFCN.yaml'

# Override specific hyperparameters
# python train.py hparams/train.yaml --lr 0.001 --N_batch 16

Loading MetricGAN Configuration

from hyperpyyaml import load_hyperpyyaml

with open("recipes/Voicebank/enhance/MetricGAN/hparams/train.yaml") as fin:
    hparams = load_hyperpyyaml(fin)

# MetricGAN has separate generator and discriminator
generator = hparams["models"]["generator"]
discriminator = hparams["models"]["discriminator"]

# And separate optimizers
g_optimizer = hparams["g_opt_class"](generator.parameters())
d_optimizer = hparams["d_opt_class"](discriminator.parameters())

# Modules dict used by Brain class
modules = hparams["modules"]
# modules = {"generator": generator, "discriminator": discriminator}

Programmatic Override

from hyperpyyaml import load_hyperpyyaml

overrides = """
number_of_epochs: 100
lr: 0.0005
N_batch: 16
"""

with open("hparams/train.yaml") as fin:
    hparams = load_hyperpyyaml(fin, overrides=overrides)

Training YAML Structure

The main training YAML (e.g., recipes/Voicebank/enhance/spectral_mask/hparams/train.yaml) includes these key sections:

# FFT parameters shared with model
Sample_rate: 16000
N_fft: 512

# Include model architecture (swap this line to change architecture)
models: !include:models/BLSTM.yaml
    N_fft: !ref <N_fft>

# Modules dict passed to Brain class
modules:
    model: !ref <models[model]>

# Optimizer
opt_class: !name:torch.optim.Adam
    lr: !ref <lr>

# Loss function
compute_cost: !name:speechbrain.nnet.losses.mse_loss

# Feature extraction
compute_STFT: !new:speechbrain.processing.features.STFT
    sample_rate: !ref <Sample_rate>
    win_length: 32
    hop_length: 16
    n_fft: !ref <N_fft>
    window_fn: !name:torch.hamming_window

HyperPyYAML Tags Reference

Tag Purpose Example
!new: Create a new instance of a class !new:torch.nn.Sigmoid
!name: Defer instantiation (used inside Sequential) !name:speechbrain.nnet.RNN.LSTM
!ref Reference another YAML value !ref <N_fft>
!include: Include another YAML file !include:models/BLSTM.yaml
!apply: Call a function at load time !apply:speechbrain.utils.seed_everything [!ref <seed>]
!PLACEHOLDER Required value that must be overridden data_folder: !PLACEHOLDER

Notes and Edge Cases

  • Parameter propagation: When !include: is used, parameters can be passed to the included file. For example, N_fft is passed from the training YAML to the model YAML.
  • Lazy instantiation: !name: tags defer object creation, which is necessary when the Sequential container needs to infer input shapes before constructing layers.
  • Module registry: The modules dictionary in the YAML directly maps to the modules argument of sb.Brain, making all model components accessible via self.modules in the Brain class.
  • Checkpoint compatibility: The checkpointer YAML section references the same model objects, ensuring that checkpoints correctly save and restore model state.

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment