Implementation:Speechbrain Speechbrain Prepare GSC
| Knowledge Sources | |
|---|---|
| Domains | Keyword_Spotting, Data_Preparation |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for preparing the Google Speech Commands V2 dataset for keyword spotting provided by the SpeechBrain library.
Description
This script prepares CSV data manifest files for the Google Speech Commands V2 dataset, which contains short audio clips of spoken commands. It handles automatic dataset download, splits data into train/validation/test using the official hashing-based assignment, supports configurable lists of wanted command words, and generates silence and unknown-word classes. The output CSV files include audio paths and class labels suitable for keyword spotting / command recognition tasks.
Usage
Use this when preparing the Google Speech Commands V2 dataset for keyword spotting or command recognition training with SpeechBrain recipes.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/Google-speech-commands/prepare_GSC.py
Signature
def prepare_GSC(
data_folder,
save_folder,
validation_percentage=10,
testing_percentage=10,
percentage_unknown=10,
percentage_silence=10,
words_wanted=[
"yes", "no", "up", "down", "left",
"right", "on", "off", "stop", "go",
],
skip_prep=False,
):
Import
from recipes.Google_speech_commands.prepare_GSC import prepare_GSC
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data_folder | str | Yes | Path to the dataset; if not present, it will be downloaded here |
| save_folder | str | Yes | Folder where data manifest files will be stored |
| validation_percentage | int | No | Percentage of data to use for validation (default: 10) |
| testing_percentage | int | No | Percentage of data to use for testing (default: 10) |
| percentage_unknown | int | No | Percentage of unknown words to preserve relative to known words (default: 10) |
| percentage_silence | int | No | Percentage of silence samples to generate relative to known words (default: 10) |
| words_wanted | list | No | List of commands to use from the dataset (default: 10 standard commands) |
| skip_prep | bool | No | If True, skip data preparation (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| train.csv | CSV | Training split manifest with audio paths and command labels |
| valid.csv | CSV | Validation split manifest |
| test.csv | CSV | Test split manifest |
Usage Examples
from recipes.Google_speech_commands.prepare_GSC import prepare_GSC
prepare_GSC(
data_folder="/path/to/GSC",
save_folder="/path/to/output",
words_wanted=["yes", "no", "up", "down", "left", "right", "on", "off", "stop", "go"],
)