Implementation:Speechbrain Speechbrain Voicebank Composite Eval
| Knowledge Sources | |
|---|---|
| Domains | Multi_Task_Learning, Speech_Enhancement |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for computing composite objective enhancement scores (CSIG, CBAK, COVL) provided by the SpeechBrain library.
Description
This module provides the eval_composite function for computing composite objective speech enhancement quality metrics in Python. It calculates three composite scores: CSIG (signal distortion), CBAK (background noise intrusiveness), and COVL (overall quality), based on weighted combinations of WSS (Weighted Spectral Slope), LLR (Log-Likelihood Ratio), segmental SNR, and PESQ. The implementation includes helper functions for LP coefficient computation, WSS distance, LLR distance, and segmental SNR calculation. Values are clipped to the MOS range [1, 5].
Usage
Use this module as a utility for evaluating speech enhancement quality. It is imported by the Voicebank MTL training recipe to provide composite evaluation metrics. Can also be used standalone to compare reference and degraded audio signals at 16kHz sample rate.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/Voicebank/MTL/ASR_enhance/composite_eval.py
Signature
def eval_composite(ref_wav, deg_wav):
"""Compute composite speech enhancement metrics.
Returns dict with keys: csig, cbak, covl
"""
...
Import
from composite_eval import eval_composite
result = eval_composite(reference_wav, degraded_wav)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ref_wav | numpy.ndarray | Yes | Reference (clean) waveform signal |
| deg_wav | numpy.ndarray | Yes | Degraded (enhanced/noisy) waveform signal |
Outputs
| Name | Type | Description |
|---|---|---|
| result | dict | Dictionary with keys "csig", "cbak", "covl" containing MOS-clipped composite scores |
Usage Examples
from composite_eval import eval_composite
import numpy as np
# Evaluate enhancement quality
ref = np.random.randn(16000) # 1 second at 16kHz
deg = np.random.randn(16000)
scores = eval_composite(ref, deg)
print(f"CSIG: {scores['csig']:.2f}, CBAK: {scores['cbak']:.2f}, COVL: {scores['covl']:.2f}")