Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Speechbrain Speechbrain Prepare GSC

From Leeroopedia


Knowledge Sources
Domains Keyword_Spotting, Data_Preparation
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for preparing the Google Speech Commands V2 dataset for keyword spotting provided by the SpeechBrain library.

Description

This script prepares CSV data manifest files for the Google Speech Commands V2 dataset, which contains short audio clips of spoken commands. It handles automatic dataset download, splits data into train/validation/test using the official hashing-based assignment, supports configurable lists of wanted command words, and generates silence and unknown-word classes. The output CSV files include audio paths and class labels suitable for keyword spotting / command recognition tasks.

Usage

Use this when preparing the Google Speech Commands V2 dataset for keyword spotting or command recognition training with SpeechBrain recipes.

Code Reference

Source Location

Signature

def prepare_GSC(
    data_folder,
    save_folder,
    validation_percentage=10,
    testing_percentage=10,
    percentage_unknown=10,
    percentage_silence=10,
    words_wanted=[
        "yes", "no", "up", "down", "left",
        "right", "on", "off", "stop", "go",
    ],
    skip_prep=False,
):

Import

from recipes.Google_speech_commands.prepare_GSC import prepare_GSC

I/O Contract

Inputs

Name Type Required Description
data_folder str Yes Path to the dataset; if not present, it will be downloaded here
save_folder str Yes Folder where data manifest files will be stored
validation_percentage int No Percentage of data to use for validation (default: 10)
testing_percentage int No Percentage of data to use for testing (default: 10)
percentage_unknown int No Percentage of unknown words to preserve relative to known words (default: 10)
percentage_silence int No Percentage of silence samples to generate relative to known words (default: 10)
words_wanted list No List of commands to use from the dataset (default: 10 standard commands)
skip_prep bool No If True, skip data preparation (default: False)

Outputs

Name Type Description
train.csv CSV Training split manifest with audio paths and command labels
valid.csv CSV Validation split manifest
test.csv CSV Test split manifest

Usage Examples

from recipes.Google_speech_commands.prepare_GSC import prepare_GSC

prepare_GSC(
    data_folder="/path/to/GSC",
    save_folder="/path/to/output",
    words_wanted=["yes", "no", "up", "down", "left", "right", "on", "off", "stop", "go"],
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment