Implementation:Speechbrain Speechbrain Prepare Voxceleb
| Property | Value |
|---|---|
| Implementation Name | Prepare Voxceleb |
| Type | API Doc |
| Repository | speechbrain/speechbrain |
| Source File | recipes/VoxCeleb/voxceleb_prepare.py:L36-163
|
| Import | from voxceleb_prepare import prepare_voxceleb
|
| Related Principle | Principle:Speechbrain_Speechbrain_VoxCeleb_Data_Preparation |
API Signature
def prepare_voxceleb(
data_folder,
save_folder,
verification_pairs_file,
splits=["train", "dev", "test"],
split_ratio=[90, 10],
seg_dur=3.0,
amp_th=5e-04,
source=None,
split_speaker=False,
random_segment=False,
skip_prep=False,
):
Description
Prepares the VoxCeleb1 or VoxCeleb2 dataset for speaker recognition training and evaluation. The function scans the raw audio directory structure, segments utterances into fixed-duration chunks, filters low-amplitude segments, splits the data into train/dev partitions, and writes the results as CSV files. It also generates enrollment and test CSV files from the verification pairs file for evaluation.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| data_folder | str | required | Path to the folder where the original VoxCeleb dataset is stored. Supports comma-separated paths for VoxCeleb1+2 combined. |
| save_folder | str | required | The directory where output CSV files will be stored. |
| verification_pairs_file | str | required | Path to the txt file containing the verification split (format: label enrol_path test_path).
|
| splits | list | ["train", "dev", "test"] |
List of splits to prepare. Valid values: "train", "dev", "test". |
| split_ratio | list | [90, 10] |
Percentage split between train and dev sets (e.g., [90, 10] means 90% train, 10% dev). |
| seg_dur | float | 3.0 |
Duration of each audio segment (chunk) in seconds. |
| amp_th | float | 5e-04 |
Amplitude threshold for filtering. Segments with average amplitude below this value are discarded. |
| source | str | None |
Path to folder containing VoxCeleb source archives. If provided, test data is extracted from this location. |
| split_speaker | bool | False |
If True, performs speaker-wise splitting (no speaker overlap between train and dev). If False, splits at the utterance level. |
| random_segment | bool | False |
If True, stores full utterance boundaries and relies on the data pipeline to select random chunks at training time. If False, pre-computes fixed chunks. |
| skip_prep | bool | False |
If True, skips preparation entirely (useful when CSV files already exist). |
Inputs
- VoxCeleb wav directory: The dataset must follow the standard structure:
data_folder/wav/speaker_id/session_id/utterance.wav - Verification pairs file: A text file listing verification trial pairs, one per line
Outputs
The function produces the following CSV files in the save_folder:
| File | Description |
|---|---|
| train.csv | Training set segments |
| dev.csv | Development (validation) set segments |
| test.csv | Test set utterances (from verification pairs) |
| enrol.csv | Enrollment utterances (from verification pairs) |
Each CSV has columns: ID, duration, wav, start, stop, spk_id
Usage Example
from voxceleb_prepare import prepare_voxceleb
# Basic preparation for training with default 3-second segments
prepare_voxceleb(
data_folder="/data/VoxCeleb1",
save_folder="/output/voxceleb_csvs",
verification_pairs_file="/data/VoxCeleb1/veri_test2.txt",
splits=["train", "dev"],
split_ratio=[90, 10],
seg_dur=3.0,
amp_th=5e-04,
)
# Preparation with speaker-level splitting and random segments
prepare_voxceleb(
data_folder="/data/VoxCeleb1,/data/VoxCeleb2",
save_folder="/output/voxceleb_csvs",
verification_pairs_file="/data/VoxCeleb1/veri_test2.txt",
splits=["train", "dev", "test"],
split_ratio=[90, 10],
seg_dur=3.0,
split_speaker=True,
random_segment=True,
)
Usage in Training Recipe
In the training recipe (train_speaker_embeddings.py), this function is called via run_on_main for DDP compatibility:
from voxceleb_prepare import prepare_voxceleb
from speechbrain.utils.distributed import run_on_main
run_on_main(
prepare_voxceleb,
kwargs={
"data_folder": hparams["data_folder"],
"save_folder": hparams["save_folder"],
"verification_pairs_file": veri_file_path,
"splits": ["train", "dev"],
"split_ratio": hparams["split_ratio"],
"seg_dur": hparams["sentence_len"],
"skip_prep": hparams["skip_prep"],
},
)
Internal Processing Details
Segment Extraction
For each wav file, the function:
- Loads the audio signal to determine its duration
- Computes the number of non-overlapping chunks:
num_chunks = int(duration / seg_dur) - Creates a CSV entry for each chunk with computed start/stop sample indices
- Filters out chunks whose average amplitude falls below amp_th
Train/Dev Splitting
Speakers present in the verification pairs file are excluded from both train and dev. The remaining utterances are split according to split_ratio:
- Utterance-level (default): Utterances are randomly shuffled and split by count.
- Speaker-level (
split_speaker=True): Speaker lists are shuffled and split, then all utterances from each speaker go to the assigned partition.
Skip Detection
The function saves a pickle file (opt_voxceleb_prepare.pkl) with the preparation configuration. On subsequent calls, if the configuration matches and all output CSV files exist, preparation is skipped automatically.