Implementation:Speechbrain Speechbrain Load Hyperpyyaml Enhancement
| Property | Value |
|---|---|
| Implementation Name | Load_Hyperpyyaml_Enhancement |
| API | load_hyperpyyaml(yaml_stream, overrides=None)
|
| Source File | External package: hyperpyyaml. Model YAML files: recipes/Voicebank/enhance/MetricGAN/models/MetricGAN.yaml, recipes/Voicebank/enhance/spectral_mask/hparams/models/BLSTM.yaml, recipes/Voicebank/enhance/spectral_mask/hparams/models/2DFCN.yaml
|
| Import | from hyperpyyaml import load_hyperpyyaml
|
| Type | Wrapper Doc (context: enhancement architectures) |
| Workflow | Speech_Enhancement_Training |
| Domains | Model_Architecture, Speech_Enhancement |
| Related Principle | Principle:Speechbrain_Speechbrain_Enhancement_Architecture_Selection |
Purpose
The load_hyperpyyaml function parses HyperPyYAML configuration files and instantiates all defined Python objects, returning a dictionary of fully-constructed hyperparameters including neural network models, optimizers, loss functions, and feature extractors. In the context of speech enhancement, it is the mechanism by which a declarative YAML specification is transformed into a runnable training configuration with a specific enhancement architecture.
Function Signature
def load_hyperpyyaml(yaml_stream, overrides=None):
"""Load a HyperPyYAML file and return a dictionary of objects.
Arguments
---------
yaml_stream : file-like or str
A file-like object or string containing the YAML content.
overrides : str, optional
A YAML-formatted string of overrides to apply.
Returns
-------
dict
A dictionary where keys are YAML keys and values are the
instantiated Python objects or raw values.
"""
Architecture Configurations
BLSTM (Spectral Mask)
File: recipes/Voicebank/enhance/spectral_mask/hparams/models/BLSTM.yaml
model: !new:speechbrain.nnet.containers.Sequential
input_shape: [null, null, !ref <N_fft> // 2 + 1]
lstm: !name:speechbrain.nnet.RNN.LSTM
hidden_size: 200
num_layers: 2
dropout: 0
bidirectional: True
linear1: !name:speechbrain.nnet.linear.Linear
n_neurons: 300
bias: True
act1: !new:torch.nn.LeakyReLU
negative_slope: 0.01
linear2: !name:speechbrain.nnet.linear.Linear
n_neurons: !ref <N_fft> // 2 + 1
bias: True
act2: !new:torch.nn.Sigmoid
| Layer | Type | Output Shape | Notes |
|---|---|---|---|
| Input | Spectrogram | [B, T, 257] | N_fft=512, so freq_bins = 257 |
| lstm | BLSTM | [B, T, 400] | 200 hidden x 2 directions |
| linear1 | Linear | [B, T, 300] | With LeakyReLU |
| linear2 | Linear | [B, T, 257] | Sigmoid for mask [0, 1] |
2D-FCN (Spectral Mask)
File: recipes/Voicebank/enhance/spectral_mask/hparams/models/2DFCN.yaml
kernel_size: (9,9)
base_channels: 64
model: !new:speechbrain.nnet.containers.Sequential
input_shape: [null, null, !ref <N_fft> // 2 + 1]
conv1: !name:speechbrain.nnet.CNN.Conv2d
out_channels: !ref <base_channels>
kernel_size: !ref <kernel_size>
BN1: !new:speechbrain.nnet.normalization.BatchNorm2d
input_size: !ref <base_channels>
act1: !new:torch.nn.LeakyReLU
negative_slope: 0.01
# ... 6 more conv+BN+activation blocks ...
conv8: !name:speechbrain.nnet.CNN.Conv2d
padding: valid
out_channels: 257
kernel_size: (257,1)
act8: !new:torch.nn.Sigmoid
| Layer | Type | Key Parameters |
|---|---|---|
| conv1-conv7 | Conv2d + BatchNorm2d + LeakyReLU | 64 channels, (9,9) kernel, same padding |
| conv8 | Conv2d | 257 output channels, (257,1) kernel, valid padding |
| act8 | Sigmoid | Final mask activation |
MetricGAN (Generator + Discriminator)
File: recipes/Voicebank/enhance/MetricGAN/models/MetricGAN.yaml
kernel_size: (5,5)
base_channels: 15
generator: !new:speechbrain.lobes.models.MetricGAN.EnhancementGenerator
discriminator: !new:speechbrain.lobes.models.MetricGAN.MetricDiscriminator
kernel_size: !ref <kernel_size>
base_channels: !ref <base_channels>
The MetricGAN configuration is distinct from spectral mask models because it defines two models:
- Generator:
EnhancementGenerator-- predicts spectral mask to enhance noisy speech - Discriminator:
MetricDiscriminator-- predicts perceptual quality scores from spectral pairs
Usage Examples
Loading a Spectral Mask Configuration
import sys
import speechbrain as sb
from hyperpyyaml import load_hyperpyyaml
# Parse command-line arguments
hparams_file, run_opts, overrides = sb.parse_arguments(sys.argv[1:])
# Load YAML and instantiate all objects
with open(hparams_file, encoding="utf-8") as fin:
hparams = load_hyperpyyaml(fin, overrides)
# Access the instantiated model
model = hparams["models"]["model"] # The Sequential model from included YAML
print(model) # Shows full architecture
# Access other instantiated objects
optimizer_class = hparams["opt_class"] # torch.optim.Adam with lr=0.0001
loss_function = hparams["compute_cost"] # speechbrain.nnet.losses.mse_loss
stft = hparams["compute_STFT"] # STFT feature extractor
Switching Architectures via Command Line
# Train with BLSTM (default in train.yaml)
# python train.py hparams/train.yaml
# Train with 2D-FCN by overriding the model include
# python train.py hparams/train.yaml --models '!include:models/2DFCN.yaml'
# Override specific hyperparameters
# python train.py hparams/train.yaml --lr 0.001 --N_batch 16
Loading MetricGAN Configuration
from hyperpyyaml import load_hyperpyyaml
with open("recipes/Voicebank/enhance/MetricGAN/hparams/train.yaml") as fin:
hparams = load_hyperpyyaml(fin)
# MetricGAN has separate generator and discriminator
generator = hparams["models"]["generator"]
discriminator = hparams["models"]["discriminator"]
# And separate optimizers
g_optimizer = hparams["g_opt_class"](generator.parameters())
d_optimizer = hparams["d_opt_class"](discriminator.parameters())
# Modules dict used by Brain class
modules = hparams["modules"]
# modules = {"generator": generator, "discriminator": discriminator}
Programmatic Override
from hyperpyyaml import load_hyperpyyaml
overrides = """
number_of_epochs: 100
lr: 0.0005
N_batch: 16
"""
with open("hparams/train.yaml") as fin:
hparams = load_hyperpyyaml(fin, overrides=overrides)
Training YAML Structure
The main training YAML (e.g., recipes/Voicebank/enhance/spectral_mask/hparams/train.yaml) includes these key sections:
# FFT parameters shared with model
Sample_rate: 16000
N_fft: 512
# Include model architecture (swap this line to change architecture)
models: !include:models/BLSTM.yaml
N_fft: !ref <N_fft>
# Modules dict passed to Brain class
modules:
model: !ref <models[model]>
# Optimizer
opt_class: !name:torch.optim.Adam
lr: !ref <lr>
# Loss function
compute_cost: !name:speechbrain.nnet.losses.mse_loss
# Feature extraction
compute_STFT: !new:speechbrain.processing.features.STFT
sample_rate: !ref <Sample_rate>
win_length: 32
hop_length: 16
n_fft: !ref <N_fft>
window_fn: !name:torch.hamming_window
HyperPyYAML Tags Reference
| Tag | Purpose | Example |
|---|---|---|
!new: |
Create a new instance of a class | !new:torch.nn.Sigmoid
|
!name: |
Defer instantiation (used inside Sequential) | !name:speechbrain.nnet.RNN.LSTM
|
!ref |
Reference another YAML value | !ref <N_fft>
|
!include: |
Include another YAML file | !include:models/BLSTM.yaml
|
!apply: |
Call a function at load time | !apply:speechbrain.utils.seed_everything [!ref <seed>]
|
!PLACEHOLDER |
Required value that must be overridden | data_folder: !PLACEHOLDER
|
Notes and Edge Cases
- Parameter propagation: When
!include:is used, parameters can be passed to the included file. For example,N_fftis passed from the training YAML to the model YAML. - Lazy instantiation:
!name:tags defer object creation, which is necessary when the Sequential container needs to infer input shapes before constructing layers. - Module registry: The
modulesdictionary in the YAML directly maps to themodulesargument ofsb.Brain, making all model components accessible viaself.modulesin the Brain class. - Checkpoint compatibility: The
checkpointerYAML section references the same model objects, ensuring that checkpoints correctly save and restore model state.
See Also
- Principle:Speechbrain_Speechbrain_Enhancement_Architecture_Selection -- The theoretical basis for architecture selection
- Implementation:Speechbrain_Speechbrain_MetricGanBrain_Fit_Batch -- How the MetricGAN architecture is trained
- Implementation:Speechbrain_Speechbrain_SEBrain_Compute_Forward -- How spectral mask architectures are used in the forward pass