Implementation:Speechbrain Speechbrain Prepare Voxceleb

Property	Value
Implementation Name	Prepare Voxceleb
Type	API Doc
Repository	speechbrain/speechbrain
Source File	`recipes/VoxCeleb/voxceleb_prepare.py:L36-163`
Import	`from voxceleb_prepare import prepare_voxceleb`
Related Principle	Principle:Speechbrain_Speechbrain_VoxCeleb_Data_Preparation

API Signature

def prepare_voxceleb(
    data_folder,
    save_folder,
    verification_pairs_file,
    splits=["train", "dev", "test"],
    split_ratio=[90, 10],
    seg_dur=3.0,
    amp_th=5e-04,
    source=None,
    split_speaker=False,
    random_segment=False,
    skip_prep=False,
):

Description

Prepares the VoxCeleb1 or VoxCeleb2 dataset for speaker recognition training and evaluation. The function scans the raw audio directory structure, segments utterances into fixed-duration chunks, filters low-amplitude segments, splits the data into train/dev partitions, and writes the results as CSV files. It also generates enrollment and test CSV files from the verification pairs file for evaluation.

Parameters

Parameter	Type	Default	Description
data_folder	str	required	Path to the folder where the original VoxCeleb dataset is stored. Supports comma-separated paths for VoxCeleb1+2 combined.
save_folder	str	required	The directory where output CSV files will be stored.
verification_pairs_file	str	required	Path to the txt file containing the verification split (format: `label enrol_path test_path`).
splits	list	`["train", "dev", "test"]`	List of splits to prepare. Valid values: "train", "dev", "test".
split_ratio	list	`[90, 10]`	Percentage split between train and dev sets (e.g., [90, 10] means 90% train, 10% dev).
seg_dur	float	`3.0`	Duration of each audio segment (chunk) in seconds.
amp_th	float	`5e-04`	Amplitude threshold for filtering. Segments with average amplitude below this value are discarded.
source	str	`None`	Path to folder containing VoxCeleb source archives. If provided, test data is extracted from this location.
split_speaker	bool	`False`	If True, performs speaker-wise splitting (no speaker overlap between train and dev). If False, splits at the utterance level.
random_segment	bool	`False`	If True, stores full utterance boundaries and relies on the data pipeline to select random chunks at training time. If False, pre-computes fixed chunks.
skip_prep	bool	`False`	If True, skips preparation entirely (useful when CSV files already exist).

Inputs

VoxCeleb wav directory: The dataset must follow the standard structure: data_folder/wav/speaker_id/session_id/utterance.wav
Verification pairs file: A text file listing verification trial pairs, one per line

Outputs

The function produces the following CSV files in the save_folder:

File	Description
train.csv	Training set segments
dev.csv	Development (validation) set segments
test.csv	Test set utterances (from verification pairs)
enrol.csv	Enrollment utterances (from verification pairs)

Each CSV has columns: ID, duration, wav, start, stop, spk_id

Usage Example

from voxceleb_prepare import prepare_voxceleb

# Basic preparation for training with default 3-second segments
prepare_voxceleb(
    data_folder="/data/VoxCeleb1",
    save_folder="/output/voxceleb_csvs",
    verification_pairs_file="/data/VoxCeleb1/veri_test2.txt",
    splits=["train", "dev"],
    split_ratio=[90, 10],
    seg_dur=3.0,
    amp_th=5e-04,
)

# Preparation with speaker-level splitting and random segments
prepare_voxceleb(
    data_folder="/data/VoxCeleb1,/data/VoxCeleb2",
    save_folder="/output/voxceleb_csvs",
    verification_pairs_file="/data/VoxCeleb1/veri_test2.txt",
    splits=["train", "dev", "test"],
    split_ratio=[90, 10],
    seg_dur=3.0,
    split_speaker=True,
    random_segment=True,
)

Usage in Training Recipe

In the training recipe (train_speaker_embeddings.py), this function is called via run_on_main for DDP compatibility:

from voxceleb_prepare import prepare_voxceleb
from speechbrain.utils.distributed import run_on_main

run_on_main(
    prepare_voxceleb,
    kwargs={
        "data_folder": hparams["data_folder"],
        "save_folder": hparams["save_folder"],
        "verification_pairs_file": veri_file_path,
        "splits": ["train", "dev"],
        "split_ratio": hparams["split_ratio"],
        "seg_dur": hparams["sentence_len"],
        "skip_prep": hparams["skip_prep"],
    },
)

Internal Processing Details

Segment Extraction

For each wav file, the function:

Loads the audio signal to determine its duration
Computes the number of non-overlapping chunks: num_chunks = int(duration / seg_dur)
Creates a CSV entry for each chunk with computed start/stop sample indices
Filters out chunks whose average amplitude falls below amp_th

Train/Dev Splitting

Speakers present in the verification pairs file are excluded from both train and dev. The remaining utterances are split according to split_ratio:

Utterance-level (default): Utterances are randomly shuffled and split by count.
Speaker-level (split_speaker=True): Speaker lists are shuffled and split, then all utterances from each speaker go to the assigned partition.

Skip Detection

The function saves a pickle file (opt_voxceleb_prepare.pkl) with the preparation configuration. On subsequent calls, if the configuration matches and all output CSV files exist, preparation is skipped automatically.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment