Implementation:Speechbrain Speechbrain Prepare RescueSpeech
| Knowledge Sources | |
|---|---|
| Domains | Speech_Enhancement, Data_Preparation |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for preparing the RescueSpeech dataset for ASR and speech enhancement tasks provided by the SpeechBrain library.
Description
This script prepares CSV files for the RescueSpeech dataset, supporting both ASR (Automatic Speech Recognition) and Speech Enhancement tasks. It processes TSV metadata files and audio recordings to generate standardized CSV manifests with columns for clean and noisy speech paths. The script handles text normalization, unicode character processing, and supports configurable sample rates.
Usage
Use this script when setting up a training pipeline for noise-robust ASR or speech enhancement experiments with the RescueSpeech dataset. It must be run before any training recipe that depends on RescueSpeech data.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/RescueSpeech/ASR/noise-robust/rescuespeech_prepare.py
Signature
def prepare_RescueSpeech(
data_folder,
save_folder,
train_tsv_file=None,
dev_tsv_file=None,
test_tsv_file=None,
accented_letters=False,
skip_prep=False,
sample_rate=16000,
task="asr",
):
Import
from rescuespeech_prepare import prepare_RescueSpeech
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data_folder | str | Yes | Path to the folder where the original RescueSpeech dataset is stored |
| save_folder | str | Yes | The directory where to store the generated CSV files |
| train_tsv_file | str | No | Path to the Train RescueSpeech .tsv file |
| dev_tsv_file | str | No | Path to the Dev RescueSpeech .tsv file |
| test_tsv_file | str | No | Path to the Test RescueSpeech .tsv file |
| accented_letters | bool | No | If True, keeps accented letters in the transcripts (default: False) |
| skip_prep | bool | No | If True, skips data preparation entirely (default: False) |
| sample_rate | int | No | The sample rate of the audio files (default: 16000) |
| task | str | No | The task type, either "asr" or "enhancement" (default: "asr") |
Outputs
| Name | Type | Description |
|---|---|---|
| train.csv | CSV file | Training set manifest with audio paths and transcriptions |
| dev.csv | CSV file | Development/validation set manifest |
| test.csv | CSV file | Test set manifest |
Usage Examples
from rescuespeech_prepare import prepare_RescueSpeech
# Prepare data for ASR task
prepare_RescueSpeech(
data_folder="/path/to/RescueSpeech",
save_folder="/path/to/output",
train_tsv_file="/path/to/RescueSpeech/train.tsv",
dev_tsv_file="/path/to/RescueSpeech/dev.tsv",
test_tsv_file="/path/to/RescueSpeech/test.tsv",
task="asr",
)
# Prepare data for speech enhancement task
prepare_RescueSpeech(
data_folder="/path/to/RescueSpeech",
save_folder="/path/to/output",
task="enhancement",
)