Environment:Speechbrain Speechbrain Speech Enhancement Dependencies
| Knowledge Sources | |
|---|---|
| Domains | Speech_Enhancement, Speech_Separation |
| Last Updated | 2026-02-09 20:00 GMT |
Overview
Additional Python packages required for speech enhancement and separation recipes: pesq, pystoi, pyloudnorm, mir_eval, and onnxruntime.
Description
These dependencies are not part of the core SpeechBrain installation but are required by specific enhancement and separation recipes. They provide perceptual quality metrics (PESQ, STOI), loudness normalization for dynamic mixing, source separation evaluation (SDR via mir_eval), and neural quality estimation (DNSMOS via onnxruntime). Each recipe lists its extra requirements in an `extra_requirements.txt` file.
Usage
Required when running speech enhancement recipes (Voicebank, DNS, RescueSpeech) or speech separation recipes (WSJ0Mix, LibriMix, WHAMandWHAMR, Aishell1Mix). Not needed for ASR, TTS, or speaker verification tasks.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux recommended | pesq may have build issues on Windows/macOS |
| Compiler | C compiler (gcc/g++) | Required for building pesq from source |
Dependencies
Python Packages
- `pesq` (Perceptual Evaluation of Speech Quality)
- `pystoi` (Short-Time Objective Intelligibility)
- `pyloudnorm` (ITU-R BS.1770 loudness normalization)
- `mir-eval` == 0.6 (Music IR evaluation; provides bss_eval_sources for SDR)
- `onnxruntime` (for DNSMOS neural quality scoring)
- `librosa` (audio analysis, used in DNS recipes)
- `pyroomacoustics` == 0.3.1 (room impulse response simulation)
- `tensorboard` (training visualization)
- `webdataset` (optional; for WebDataset-based data loading)
Credentials
No credentials required.
Quick Install
# Enhancement recipes (Voicebank MetricGAN, DNS)
pip install pesq pystoi mir-eval==0.6 onnxruntime librosa tensorboard
# Separation recipes (WSJ0Mix, LibriMix)
pip install pyloudnorm mir-eval==0.6
# DNS-specific
pip install pyroomacoustics==0.3.1 webdataset
Code Evidence
PESQ import in MetricGAN from `recipes/Voicebank/enhance/MetricGAN/train.py:22`:
from pesq import pesq
pystoi import in DNS enhancement from `recipes/DNS/enhancement/train.py:39`:
from pystoi import stoi
pyloudnorm in dynamic mixing from `recipes/LibriMix/separation/dynamic_mixing.py:7`:
import pyloudnorm
mir_eval for SDR from `recipes/WSJ0Mix/separation/train.py:282`:
from mir_eval.separation import bss_eval_sources
WebDataset optional import from `speechbrain/dataio/dataloader.py:58-63`:
# Optional support for webdataset
try:
import webdataset as wds
WDS_AVAILABLE = True
except ImportError:
WDS_AVAILABLE = False
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ModuleNotFoundError: No module named 'pesq'` | pesq not installed | `pip install pesq` |
| `ImportError: pyloudnorm` | pyloudnorm not installed | `pip install pyloudnorm` |
| `ModuleNotFoundError: No module named 'mir_eval'` | mir_eval not installed | `pip install mir-eval==0.6` |
| `error: command 'gcc' failed` | Missing C compiler for pesq build | Install `build-essential` on Ubuntu |
Compatibility Notes
- pesq: Requires C compiler to build from source. Pre-built wheels may not be available for all platforms.
- mir-eval: Version 0.6 is specifically pinned in SpeechBrain recipes. Newer versions may have API changes.
- pyroomacoustics: Version 0.3.1 is pinned for DNS recipes.
- webdataset: Fully optional; only needed for WebDataset-based data loading pipelines.
Related Pages
- Implementation:Speechbrain_Speechbrain_MetricGanBrain_Fit_Batch
- Implementation:Speechbrain_Speechbrain_SEBrain_Compute_Forward
- Implementation:Speechbrain_Speechbrain_Composite_Eval_Metrics
- Implementation:Speechbrain_Speechbrain_Separation_Save_Results
- Implementation:Speechbrain_Speechbrain_Dynamic_Mix_Data_Prep