Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Speechbrain Speechbrain Prepare RescueSpeech

From Leeroopedia


Knowledge Sources
Domains Speech_Enhancement, Data_Preparation
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for preparing the RescueSpeech dataset for ASR and speech enhancement tasks provided by the SpeechBrain library.

Description

This script prepares CSV files for the RescueSpeech dataset, supporting both ASR (Automatic Speech Recognition) and Speech Enhancement tasks. It processes TSV metadata files and audio recordings to generate standardized CSV manifests with columns for clean and noisy speech paths. The script handles text normalization, unicode character processing, and supports configurable sample rates.

Usage

Use this script when setting up a training pipeline for noise-robust ASR or speech enhancement experiments with the RescueSpeech dataset. It must be run before any training recipe that depends on RescueSpeech data.

Code Reference

Source Location

Signature

def prepare_RescueSpeech(
    data_folder,
    save_folder,
    train_tsv_file=None,
    dev_tsv_file=None,
    test_tsv_file=None,
    accented_letters=False,
    skip_prep=False,
    sample_rate=16000,
    task="asr",
):

Import

from rescuespeech_prepare import prepare_RescueSpeech

I/O Contract

Inputs

Name Type Required Description
data_folder str Yes Path to the folder where the original RescueSpeech dataset is stored
save_folder str Yes The directory where to store the generated CSV files
train_tsv_file str No Path to the Train RescueSpeech .tsv file
dev_tsv_file str No Path to the Dev RescueSpeech .tsv file
test_tsv_file str No Path to the Test RescueSpeech .tsv file
accented_letters bool No If True, keeps accented letters in the transcripts (default: False)
skip_prep bool No If True, skips data preparation entirely (default: False)
sample_rate int No The sample rate of the audio files (default: 16000)
task str No The task type, either "asr" or "enhancement" (default: "asr")

Outputs

Name Type Description
train.csv CSV file Training set manifest with audio paths and transcriptions
dev.csv CSV file Development/validation set manifest
test.csv CSV file Test set manifest

Usage Examples

from rescuespeech_prepare import prepare_RescueSpeech

# Prepare data for ASR task
prepare_RescueSpeech(
    data_folder="/path/to/RescueSpeech",
    save_folder="/path/to/output",
    train_tsv_file="/path/to/RescueSpeech/train.tsv",
    dev_tsv_file="/path/to/RescueSpeech/dev.tsv",
    test_tsv_file="/path/to/RescueSpeech/test.tsv",
    task="asr",
)

# Prepare data for speech enhancement task
prepare_RescueSpeech(
    data_folder="/path/to/RescueSpeech",
    save_folder="/path/to/output",
    task="enhancement",
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment