Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Speechbrain Speechbrain Prepare CommonLanguage

From Leeroopedia


Knowledge Sources
Domains Language Identification, Data Preparation
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for preparing CommonLanguage dataset for language identification provided by the SpeechBrain library.

Description

This script prepares CSV manifest files from the CommonLanguage dataset for spoken language identification (LID) tasks. It processes audio files across 45 languages (including Arabic, Basque, Catalan, Chinese variants, English, French, German, and many others from the Common Voice ecosystem), reads audio durations via torchaudio, and generates train/dev/test CSV files with utterance metadata. The dataset is sourced from Zenodo and uses a predefined list of supported languages.

Usage

Use this when preparing the CommonLanguage dataset for language identification training with SpeechBrain recipes.

Code Reference

Source Location

Signature

def prepare_common_language(data_folder, save_folder, skip_prep=False):

Import

from common_language_prepare import prepare_common_language

I/O Contract

Inputs

Name Type Required Description
data_folder str Yes Path to the folder where the CommonLanguage dataset is stored (should include the multi-language data: /datasets/CommonLanguage)
save_folder str Yes The directory where to store the output CSV files
skip_prep bool No If True, skip data preparation entirely (default: False)

Outputs

Name Type Description
train.csv CSV File Train split manifest with utterance IDs, file paths, durations, and language labels
dev.csv CSV File Development/validation split manifest
test.csv CSV File Test split manifest

Usage Examples

from common_language_prepare import prepare_common_language

prepare_common_language(
    data_folder="/datasets/CommonLanguage",
    save_folder="exp/CommonLanguage_exp",
    skip_prep=False,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment