Implementation:Speechbrain Speechbrain Prepare CommonLanguage

Knowledge Sources	SpeechBrain
Domains	Language Identification, Data Preparation
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for preparing CommonLanguage dataset for language identification provided by the SpeechBrain library.

Description

This script prepares CSV manifest files from the CommonLanguage dataset for spoken language identification (LID) tasks. It processes audio files across 45 languages (including Arabic, Basque, Catalan, Chinese variants, English, French, German, and many others from the Common Voice ecosystem), reads audio durations via torchaudio, and generates train/dev/test CSV files with utterance metadata. The dataset is sourced from Zenodo and uses a predefined list of supported languages.

Usage

Use this when preparing the CommonLanguage dataset for language identification training with SpeechBrain recipes.

Code Reference

Source Location

Repository: SpeechBrain
File: recipes/CommonLanguage/common_language_prepare.py

Signature

def prepare_common_language(data_folder, save_folder, skip_prep=False):

Import

from common_language_prepare import prepare_common_language

I/O Contract

Inputs

Name	Type	Required	Description
data_folder	str	Yes	Path to the folder where the CommonLanguage dataset is stored (should include the multi-language data: /datasets/CommonLanguage)
save_folder	str	Yes	The directory where to store the output CSV files
skip_prep	bool	No	If True, skip data preparation entirely (default: False)

Outputs

Name	Type	Description
train.csv	CSV File	Train split manifest with utterance IDs, file paths, durations, and language labels
dev.csv	CSV File	Development/validation split manifest
test.csv	CSV File	Test split manifest

Usage Examples

from common_language_prepare import prepare_common_language

prepare_common_language(
    data_folder="/datasets/CommonLanguage",
    save_folder="exp/CommonLanguage_exp",
    skip_prep=False,
)

Related Pages

Principle:Speechbrain_Speechbrain_Dataset_Specific_Data_Preparation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment