Implementation:Speechbrain Speechbrain Prepare RescueSpeech

Knowledge Sources	SpeechBrain
Domains	Speech_Enhancement, Data_Preparation
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for preparing the RescueSpeech dataset for ASR and speech enhancement tasks provided by the SpeechBrain library.

Description

This script prepares CSV files for the RescueSpeech dataset, supporting both ASR (Automatic Speech Recognition) and Speech Enhancement tasks. It processes TSV metadata files and audio recordings to generate standardized CSV manifests with columns for clean and noisy speech paths. The script handles text normalization, unicode character processing, and supports configurable sample rates.

Usage

Use this script when setting up a training pipeline for noise-robust ASR or speech enhancement experiments with the RescueSpeech dataset. It must be run before any training recipe that depends on RescueSpeech data.

Code Reference

Source Location

Repository: SpeechBrain
File: recipes/RescueSpeech/ASR/noise-robust/rescuespeech_prepare.py

Signature

def prepare_RescueSpeech(
    data_folder,
    save_folder,
    train_tsv_file=None,
    dev_tsv_file=None,
    test_tsv_file=None,
    accented_letters=False,
    skip_prep=False,
    sample_rate=16000,
    task="asr",
):

Import

from rescuespeech_prepare import prepare_RescueSpeech

I/O Contract

Inputs

Name	Type	Required	Description
data_folder	str	Yes	Path to the folder where the original RescueSpeech dataset is stored
save_folder	str	Yes	The directory where to store the generated CSV files
train_tsv_file	str	No	Path to the Train RescueSpeech .tsv file
dev_tsv_file	str	No	Path to the Dev RescueSpeech .tsv file
test_tsv_file	str	No	Path to the Test RescueSpeech .tsv file
accented_letters	bool	No	If True, keeps accented letters in the transcripts (default: False)
skip_prep	bool	No	If True, skips data preparation entirely (default: False)
sample_rate	int	No	The sample rate of the audio files (default: 16000)
task	str	No	The task type, either "asr" or "enhancement" (default: "asr")

Outputs

Name	Type	Description
train.csv	CSV file	Training set manifest with audio paths and transcriptions
dev.csv	CSV file	Development/validation set manifest
test.csv	CSV file	Test set manifest

Usage Examples

from rescuespeech_prepare import prepare_RescueSpeech

# Prepare data for ASR task
prepare_RescueSpeech(
    data_folder="/path/to/RescueSpeech",
    save_folder="/path/to/output",
    train_tsv_file="/path/to/RescueSpeech/train.tsv",
    dev_tsv_file="/path/to/RescueSpeech/dev.tsv",
    test_tsv_file="/path/to/RescueSpeech/test.tsv",
    task="asr",
)

# Prepare data for speech enhancement task
prepare_RescueSpeech(
    data_folder="/path/to/RescueSpeech",
    save_folder="/path/to/output",
    task="enhancement",
)

Related Pages

Principle:Speechbrain_Speechbrain_Dataset_Specific_Data_Preparation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment