Implementation:Speechbrain Speechbrain DNS Audiolib
| Knowledge Sources | |
|---|---|
| Domains | Speech_Enhancement, Data_Synthesis |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for audio I/O, normalization, SNR mixing, and signal manipulation utilities provided by the SpeechBrain library.
Description
This module provides a comprehensive set of audio utility functions used by the DNS (Deep Noise Suppression) Challenge noisy speech synthesis pipeline. It includes functions for reading and writing audio files with optional normalization, checking for clipping, mixing clean speech with noise at specified SNR levels (both global and segmental), adding reverb, detecting speech activity based on energy thresholds, resampling audio to different sample rates, and segmenting long audio clips. The mixing functions normalize signals to a target dB FS level, compute appropriate noise scalars for desired SNR, and ensure the output does not clip. Originally sourced from the Microsoft DNS-Challenge repository.
Usage
Import individual functions from this module when building noisy speech synthesis pipelines. The functions are designed to be composed together for constructing clean-noisy audio pairs at controlled SNR levels.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/DNS/noisyspeech_synthesizer/audiolib.py
Signature
def is_clipped(audio, clipping_threshold=0.99):
"""Check if an audio signal is clipped."""
...
def normalize(audio, target_level=-25):
"""Normalize the signal to the target level."""
...
def normalize_segmental_rms(audio, rms, target_level=-25):
"""Normalize the signal to the target level based on segmental RMS."""
...
def audioread(path, norm=False, start=0, stop=None, target_level=-25):
"""Function to read audio."""
...
def audiowrite(destpath, audio, sample_rate=16000, norm=False,
target_level=-25, clipping_threshold=0.99, clip_test=False):
"""Function to write audio."""
...
def snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99):
"""Function to mix clean speech and noise at various SNR levels."""
...
def segmental_snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99):
"""Function to mix clean speech and noise at various segmental SNR levels."""
...
def active_rms(clean, noise, fs=16000, energy_thresh=-50):
"""Returns the clean and noise RMS calculated only in active portions."""
...
def activitydetector(audio, fs=16000, energy_thresh=0.13, target_level=-25):
"""Return the percentage of time the audio signal is above an energy threshold."""
...
def resampler(input_dir, target_sr=16000, ext="*.wav"):
"""Resamples the audio files in input_dir to target_sr."""
...
def audio_segmenter(input_dir, dest_dir, segment_len=10, ext="*.wav"):
"""Segments the audio clips in dir to segment_len in secs."""
...
Import
from audiolib import (
audioread, audiowrite, normalize, is_clipped,
snr_mixer, segmental_snr_mixer, activitydetector,
active_rms, resampler, audio_segmenter
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| audio | np.ndarray | Yes | Audio signal as a numpy array |
| path | str | Yes (audioread) | Path to the audio file to read |
| target_level | float | No | Target normalization level in dB FS (default: -25) |
| clipping_threshold | float | No | Threshold above which audio is considered clipped (default: 0.99) |
| snr | float | Yes (mixers) | Desired signal-to-noise ratio in dB |
| params | dict | Yes (mixers) | Configuration dict with target_level_lower/upper keys |
| clean | np.ndarray | Yes (mixers) | Clean speech signal |
| noise | np.ndarray | Yes (mixers) | Noise signal |
Outputs
| Name | Type | Description |
|---|---|---|
| audio | np.ndarray | Processed audio signal (normalized, mixed, etc.) |
| sample_rate | int | Sample rate of the read audio |
| clean, noise, noisyspeech | np.ndarray | Mixed signals from SNR mixer functions |
| noisy_rms_level | int | Actual RMS level of the noisy mixture |
| perc_active | float | Percentage of active frames from activity detector |
Usage Examples
from audiolib import audioread, normalize, snr_mixer
# Read and normalize audio
audio, sr = audioread("/path/to/speech.wav", norm=True, target_level=-25)
# Mix clean speech with noise at 10 dB SNR
params = {"target_level_lower": -35, "target_level_upper": -15}
clean, noise_scaled, noisy, rms_level = snr_mixer(
params, clean_audio, noise_audio, snr=10
)