Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Speechbrain Speechbrain DNS Audiolib

From Leeroopedia


Knowledge Sources
Domains Speech_Enhancement, Data_Synthesis
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for audio I/O, normalization, SNR mixing, and signal manipulation utilities provided by the SpeechBrain library.

Description

This module provides a comprehensive set of audio utility functions used by the DNS (Deep Noise Suppression) Challenge noisy speech synthesis pipeline. It includes functions for reading and writing audio files with optional normalization, checking for clipping, mixing clean speech with noise at specified SNR levels (both global and segmental), adding reverb, detecting speech activity based on energy thresholds, resampling audio to different sample rates, and segmenting long audio clips. The mixing functions normalize signals to a target dB FS level, compute appropriate noise scalars for desired SNR, and ensure the output does not clip. Originally sourced from the Microsoft DNS-Challenge repository.

Usage

Import individual functions from this module when building noisy speech synthesis pipelines. The functions are designed to be composed together for constructing clean-noisy audio pairs at controlled SNR levels.

Code Reference

Source Location

Signature

def is_clipped(audio, clipping_threshold=0.99):
    """Check if an audio signal is clipped."""
    ...

def normalize(audio, target_level=-25):
    """Normalize the signal to the target level."""
    ...

def normalize_segmental_rms(audio, rms, target_level=-25):
    """Normalize the signal to the target level based on segmental RMS."""
    ...

def audioread(path, norm=False, start=0, stop=None, target_level=-25):
    """Function to read audio."""
    ...

def audiowrite(destpath, audio, sample_rate=16000, norm=False,
               target_level=-25, clipping_threshold=0.99, clip_test=False):
    """Function to write audio."""
    ...

def snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99):
    """Function to mix clean speech and noise at various SNR levels."""
    ...

def segmental_snr_mixer(params, clean, noise, snr, target_level=-25, clipping_threshold=0.99):
    """Function to mix clean speech and noise at various segmental SNR levels."""
    ...

def active_rms(clean, noise, fs=16000, energy_thresh=-50):
    """Returns the clean and noise RMS calculated only in active portions."""
    ...

def activitydetector(audio, fs=16000, energy_thresh=0.13, target_level=-25):
    """Return the percentage of time the audio signal is above an energy threshold."""
    ...

def resampler(input_dir, target_sr=16000, ext="*.wav"):
    """Resamples the audio files in input_dir to target_sr."""
    ...

def audio_segmenter(input_dir, dest_dir, segment_len=10, ext="*.wav"):
    """Segments the audio clips in dir to segment_len in secs."""
    ...

Import

from audiolib import (
    audioread, audiowrite, normalize, is_clipped,
    snr_mixer, segmental_snr_mixer, activitydetector,
    active_rms, resampler, audio_segmenter
)

I/O Contract

Inputs

Name Type Required Description
audio np.ndarray Yes Audio signal as a numpy array
path str Yes (audioread) Path to the audio file to read
target_level float No Target normalization level in dB FS (default: -25)
clipping_threshold float No Threshold above which audio is considered clipped (default: 0.99)
snr float Yes (mixers) Desired signal-to-noise ratio in dB
params dict Yes (mixers) Configuration dict with target_level_lower/upper keys
clean np.ndarray Yes (mixers) Clean speech signal
noise np.ndarray Yes (mixers) Noise signal

Outputs

Name Type Description
audio np.ndarray Processed audio signal (normalized, mixed, etc.)
sample_rate int Sample rate of the read audio
clean, noise, noisyspeech np.ndarray Mixed signals from SNR mixer functions
noisy_rms_level int Actual RMS level of the noisy mixture
perc_active float Percentage of active frames from activity detector

Usage Examples

from audiolib import audioread, normalize, snr_mixer

# Read and normalize audio
audio, sr = audioread("/path/to/speech.wav", norm=True, target_level=-25)

# Mix clean speech with noise at 10 dB SNR
params = {"target_level_lower": -35, "target_level_upper": -15}
clean, noise_scaled, noisy, rms_level = snr_mixer(
    params, clean_audio, noise_audio, snr=10
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment