Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Speechbrain Speechbrain Speech Enhancement Dependencies

From Leeroopedia


Knowledge Sources
Domains Speech_Enhancement, Speech_Separation
Last Updated 2026-02-09 20:00 GMT

Overview

Additional Python packages required for speech enhancement and separation recipes: pesq, pystoi, pyloudnorm, mir_eval, and onnxruntime.

Description

These dependencies are not part of the core SpeechBrain installation but are required by specific enhancement and separation recipes. They provide perceptual quality metrics (PESQ, STOI), loudness normalization for dynamic mixing, source separation evaluation (SDR via mir_eval), and neural quality estimation (DNSMOS via onnxruntime). Each recipe lists its extra requirements in an `extra_requirements.txt` file.

Usage

Required when running speech enhancement recipes (Voicebank, DNS, RescueSpeech) or speech separation recipes (WSJ0Mix, LibriMix, WHAMandWHAMR, Aishell1Mix). Not needed for ASR, TTS, or speaker verification tasks.

System Requirements

Category Requirement Notes
OS Linux recommended pesq may have build issues on Windows/macOS
Compiler C compiler (gcc/g++) Required for building pesq from source

Dependencies

Python Packages

  • `pesq` (Perceptual Evaluation of Speech Quality)
  • `pystoi` (Short-Time Objective Intelligibility)
  • `pyloudnorm` (ITU-R BS.1770 loudness normalization)
  • `mir-eval` == 0.6 (Music IR evaluation; provides bss_eval_sources for SDR)
  • `onnxruntime` (for DNSMOS neural quality scoring)
  • `librosa` (audio analysis, used in DNS recipes)
  • `pyroomacoustics` == 0.3.1 (room impulse response simulation)
  • `tensorboard` (training visualization)
  • `webdataset` (optional; for WebDataset-based data loading)

Credentials

No credentials required.

Quick Install

# Enhancement recipes (Voicebank MetricGAN, DNS)
pip install pesq pystoi mir-eval==0.6 onnxruntime librosa tensorboard

# Separation recipes (WSJ0Mix, LibriMix)
pip install pyloudnorm mir-eval==0.6

# DNS-specific
pip install pyroomacoustics==0.3.1 webdataset

Code Evidence

PESQ import in MetricGAN from `recipes/Voicebank/enhance/MetricGAN/train.py:22`:

from pesq import pesq

pystoi import in DNS enhancement from `recipes/DNS/enhancement/train.py:39`:

from pystoi import stoi

pyloudnorm in dynamic mixing from `recipes/LibriMix/separation/dynamic_mixing.py:7`:

import pyloudnorm

mir_eval for SDR from `recipes/WSJ0Mix/separation/train.py:282`:

from mir_eval.separation import bss_eval_sources

WebDataset optional import from `speechbrain/dataio/dataloader.py:58-63`:

# Optional support for webdataset
try:
    import webdataset as wds
    WDS_AVAILABLE = True
except ImportError:
    WDS_AVAILABLE = False

Common Errors

Error Message Cause Solution
`ModuleNotFoundError: No module named 'pesq'` pesq not installed `pip install pesq`
`ImportError: pyloudnorm` pyloudnorm not installed `pip install pyloudnorm`
`ModuleNotFoundError: No module named 'mir_eval'` mir_eval not installed `pip install mir-eval==0.6`
`error: command 'gcc' failed` Missing C compiler for pesq build Install `build-essential` on Ubuntu

Compatibility Notes

  • pesq: Requires C compiler to build from source. Pre-built wheels may not be available for all platforms.
  • mir-eval: Version 0.6 is specifically pinned in SpeechBrain recipes. Newer versions may have API changes.
  • pyroomacoustics: Version 0.3.1 is pinned for DNS recipes.
  • webdataset: Fully optional; only needed for WebDataset-based data loading pipelines.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment