Implementation:Speechbrain Speechbrain Prepare UrbanSound8k

Knowledge Sources	SpeechBrain
Domains	Sound_Classification, Data_Preparation
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for preparing the UrbanSound8K dataset for sound classification tasks provided by the SpeechBrain library.

Description

This script creates JSON data manifest files from the UrbanSound8K dataset for use in SpeechBrain sound classification recipes. It respects the dataset's predefined 10-fold cross-validation structure, allowing users to specify which folds to use for training, validation, and testing. The script processes the UrbanSound8K metadata CSV and audio files to generate standardized JSON manifests with audio paths, class labels, and fold information.

Usage

Use this script when preparing data for environmental sound classification experiments on UrbanSound8K. Follow the dataset authors' guidelines: always use the predefined 10-fold cross-validation splits and never reshuffle the data.

Code Reference

Source Location

Repository: SpeechBrain
File: recipes/UrbanSound8k/SoundClassification/urbansound8k_prepare.py

Signature

def prepare_urban_sound_8k(
    data_folder,
    audio_data_folder,
    save_json_train,
    save_json_valid,
    save_json_test,
    train_fold_nums=[1, 2, 3, 4, 5, 6, 7, 8],
    valid_fold_nums=[9],
    test_fold_nums=[10],
    skip_manifest_creation=False,
):

Import

from urbansound8k_prepare import prepare_urban_sound_8k

I/O Contract

Inputs

Name	Type	Required	Description
data_folder	str	Yes	Path to the folder where UrbanSound8K dataset metadata is stored
audio_data_folder	str	Yes	Path to the folder where UrbanSound8K audio files are stored
save_json_train	str	Yes	Path where the train data specification JSON file will be saved
save_json_valid	str	Yes	Path where the validation data specification JSON file will be saved
save_json_test	str	Yes	Path where the test data specification JSON file will be saved
train_fold_nums	list	No	List of integers [1-10] defining folds for training (default: [1-8])
valid_fold_nums	list	No	List of integers [1-10] defining folds for validation (default: [9])
test_fold_nums	list	No	List of integers [1-10] defining folds for testing (default: [10])
skip_manifest_creation	bool	No	If True, skips manifest creation (default: False)

Outputs

Name	Type	Description
train.json	JSON file	Training manifest with audio paths and sound class labels
valid.json	JSON file	Validation manifest
test.json	JSON file	Test manifest

Usage Examples

from urbansound8k_prepare import prepare_urban_sound_8k

# Standard 10-fold cross-validation setup (fold 10 as test)
prepare_urban_sound_8k(
    data_folder="/path/to/UrbanSound8k",
    audio_data_folder="/path/to/UrbanSound8k/audio",
    save_json_train="output/train.json",
    save_json_valid="output/valid.json",
    save_json_test="output/test.json",
    train_fold_nums=[1, 2, 3, 4, 5, 6, 7, 8],
    valid_fold_nums=[9],
    test_fold_nums=[10],
)

Related Pages

Principle:Speechbrain_Speechbrain_Dataset_Specific_Data_Preparation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment