Implementation:Speechbrain Speechbrain DNS Audiolib

Knowledge Sources	SpeechBrain
Domains	Speech_Enhancement, Data_Synthesis
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for audio I/O, normalization, SNR mixing, and signal manipulation utilities provided by the SpeechBrain library.

Description

This module provides a comprehensive set of audio utility functions used by the DNS (Deep Noise Suppression) Challenge noisy speech synthesis pipeline. It includes functions for reading and writing audio files with optional normalization, checking for clipping, mixing clean speech with noise at specified SNR levels (both global and segmental), adding reverb, detecting speech activity based on energy thresholds, resampling audio to different sample rates, and segmenting long audio clips. The mixing functions normalize signals to a target dB FS level, compute appropriate noise scalars for desired SNR, and ensure the output does not clip. Originally sourced from the Microsoft DNS-Challenge repository.

Usage

Import individual functions from this module when building noisy speech synthesis pipelines. The functions are designed to be composed together for constructing clean-noisy audio pairs at controlled SNR levels.

Code Reference

Source Location

Repository: SpeechBrain
File: recipes/DNS/noisyspeech_synthesizer/audiolib.py

Signature

def is_clipped(audio, clipping_threshold=0.99):
    """Check if an audio signal is clipped."""
    ...

def normalize(audio, target_level=-25):
    """Normalize the signal to the target level."""
    ...

def normalize_segmental_rms(audio, rms, target_level=-25):
    """Normalize the signal to the target level based on segmental RMS."""
    ...

def audioread(path, norm=False, start=0, stop=None, target_level=-25):
    """Function to read audio."""
    ...

def audiowrite(destpath, audio, sample_rate=16000, norm=False,
               target_level=-25, clipping_threshold=0.99, clip_test=False):
    """Function to write audio."""
    ...

def snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99):
    """Function to mix clean speech and noise at various SNR levels."""
    ...

def segmental_snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99):
    """Function to mix clean speech and noise at various segmental SNR levels."""
    ...

def active_rms(clean, noise, fs=16000, energy_thresh=-50):
    """Returns the clean and noise RMS calculated only in active portions."""
    ...

def activitydetector(audio, fs=16000, energy_thresh=0.13, target_level=-25):
    """Return the percentage of time the audio signal is above an energy threshold."""
    ...

def resampler(input_dir, target_sr=16000, ext="*.wav"):
    """Resamples the audio files in input_dir to target_sr."""
    ...

def audio_segmenter(input_dir, dest_dir, segment_len=10, ext="*.wav"):
    """Segments the audio clips in dir to segment_len in secs."""
    ...

Import

from audiolib import (
    audioread, audiowrite, normalize, is_clipped,
    snr_mixer, segmental_snr_mixer, activitydetector,
    active_rms, resampler, audio_segmenter
)

I/O Contract

Inputs

Name	Type	Required	Description
audio	np.ndarray	Yes	Audio signal as a numpy array
path	str	Yes (audioread)	Path to the audio file to read
target_level	float	No	Target normalization level in dB FS (default: -25)
clipping_threshold	float	No	Threshold above which audio is considered clipped (default: 0.99)
snr	float	Yes (mixers)	Desired signal-to-noise ratio in dB
params	dict	Yes (mixers)	Configuration dict with target_level_lower/upper keys
clean	np.ndarray	Yes (mixers)	Clean speech signal
noise	np.ndarray	Yes (mixers)	Noise signal

Outputs

Name	Type	Description
audio	np.ndarray	Processed audio signal (normalized, mixed, etc.)
sample_rate	int	Sample rate of the read audio
clean, noise, noisyspeech	np.ndarray	Mixed signals from SNR mixer functions
noisy_rms_level	int	Actual RMS level of the noisy mixture
perc_active	float	Percentage of active frames from activity detector

Usage Examples

from audiolib import audioread, normalize, snr_mixer

# Read and normalize audio
audio, sr = audioread("/path/to/speech.wav", norm=True, target_level=-25)

# Mix clean speech with noise at 10 dB SNR
params = {"target_level_lower": -35, "target_level_upper": -15}
clean, noise_scaled, noisy, rms_level = snr_mixer(
    params, clean_audio, noise_audio, snr=10
)

Related Pages

Principle:Speechbrain_Speechbrain_Noisy_Speech_Data_Preparation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment