Implementation:Speechbrain Speechbrain Prepare CommonVoice LM

Knowledge Sources	SpeechBrain
Domains	Language Modeling, Data Preparation
Last Updated	2026-02-09 00:00 GMT

Overview

Variant of the Common Voice data preparation script for the language model recipe.

Description

This file (recipes/CommonVoice/LM/common_voice_prepare.py) is a copy of the canonical Common Voice data preparation script. It provides the same prepare_common_voice function with identical parameters and behavior, placed within the LM recipe subdirectory for convenience. The canonical implementation is documented on the Implementation:Speechbrain_Speechbrain_Prepare_CommonVoice_Seq2Seq page.

Usage

Use this when preparing the Mozilla Common Voice dataset specifically for language model training. See the canonical page for full details.

Code Reference

Source Location

Repository: SpeechBrain
File: recipes/CommonVoice/LM/common_voice_prepare.py

Canonical Implementation

This is a duplicate of the seq2seq Common Voice preparation script. For full documentation including signature, I/O contract, and usage examples, see:

Implementation:Speechbrain_Speechbrain_Prepare_CommonVoice_Seq2Seq

Related Pages

Implementation:Speechbrain_Speechbrain_Prepare_CommonVoice_Seq2Seq -- Canonical implementation page
Implementation:Speechbrain_Speechbrain_Prepare_CommonVoice_Transducer -- Same script used for transducer recipe
Implementation:Speechbrain_Speechbrain_Prepare_CommonVoice_SSL -- Same script used for self-supervised learning recipe
Principle:Speechbrain_Speechbrain_Dataset_Specific_Data_Preparation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment