Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Speechbrain Speechbrain Prepare Voxceleb

From Leeroopedia


Property Value
Implementation Name Prepare Voxceleb
Type API Doc
Repository speechbrain/speechbrain
Source File recipes/VoxCeleb/voxceleb_prepare.py:L36-163
Import from voxceleb_prepare import prepare_voxceleb
Related Principle Principle:Speechbrain_Speechbrain_VoxCeleb_Data_Preparation

API Signature

def prepare_voxceleb(
    data_folder,
    save_folder,
    verification_pairs_file,
    splits=["train", "dev", "test"],
    split_ratio=[90, 10],
    seg_dur=3.0,
    amp_th=5e-04,
    source=None,
    split_speaker=False,
    random_segment=False,
    skip_prep=False,
):

Description

Prepares the VoxCeleb1 or VoxCeleb2 dataset for speaker recognition training and evaluation. The function scans the raw audio directory structure, segments utterances into fixed-duration chunks, filters low-amplitude segments, splits the data into train/dev partitions, and writes the results as CSV files. It also generates enrollment and test CSV files from the verification pairs file for evaluation.

Parameters

Parameter Type Default Description
data_folder str required Path to the folder where the original VoxCeleb dataset is stored. Supports comma-separated paths for VoxCeleb1+2 combined.
save_folder str required The directory where output CSV files will be stored.
verification_pairs_file str required Path to the txt file containing the verification split (format: label enrol_path test_path).
splits list ["train", "dev", "test"] List of splits to prepare. Valid values: "train", "dev", "test".
split_ratio list [90, 10] Percentage split between train and dev sets (e.g., [90, 10] means 90% train, 10% dev).
seg_dur float 3.0 Duration of each audio segment (chunk) in seconds.
amp_th float 5e-04 Amplitude threshold for filtering. Segments with average amplitude below this value are discarded.
source str None Path to folder containing VoxCeleb source archives. If provided, test data is extracted from this location.
split_speaker bool False If True, performs speaker-wise splitting (no speaker overlap between train and dev). If False, splits at the utterance level.
random_segment bool False If True, stores full utterance boundaries and relies on the data pipeline to select random chunks at training time. If False, pre-computes fixed chunks.
skip_prep bool False If True, skips preparation entirely (useful when CSV files already exist).

Inputs

  • VoxCeleb wav directory: The dataset must follow the standard structure: data_folder/wav/speaker_id/session_id/utterance.wav
  • Verification pairs file: A text file listing verification trial pairs, one per line

Outputs

The function produces the following CSV files in the save_folder:

File Description
train.csv Training set segments
dev.csv Development (validation) set segments
test.csv Test set utterances (from verification pairs)
enrol.csv Enrollment utterances (from verification pairs)

Each CSV has columns: ID, duration, wav, start, stop, spk_id

Usage Example

from voxceleb_prepare import prepare_voxceleb

# Basic preparation for training with default 3-second segments
prepare_voxceleb(
    data_folder="/data/VoxCeleb1",
    save_folder="/output/voxceleb_csvs",
    verification_pairs_file="/data/VoxCeleb1/veri_test2.txt",
    splits=["train", "dev"],
    split_ratio=[90, 10],
    seg_dur=3.0,
    amp_th=5e-04,
)

# Preparation with speaker-level splitting and random segments
prepare_voxceleb(
    data_folder="/data/VoxCeleb1,/data/VoxCeleb2",
    save_folder="/output/voxceleb_csvs",
    verification_pairs_file="/data/VoxCeleb1/veri_test2.txt",
    splits=["train", "dev", "test"],
    split_ratio=[90, 10],
    seg_dur=3.0,
    split_speaker=True,
    random_segment=True,
)

Usage in Training Recipe

In the training recipe (train_speaker_embeddings.py), this function is called via run_on_main for DDP compatibility:

from voxceleb_prepare import prepare_voxceleb
from speechbrain.utils.distributed import run_on_main

run_on_main(
    prepare_voxceleb,
    kwargs={
        "data_folder": hparams["data_folder"],
        "save_folder": hparams["save_folder"],
        "verification_pairs_file": veri_file_path,
        "splits": ["train", "dev"],
        "split_ratio": hparams["split_ratio"],
        "seg_dur": hparams["sentence_len"],
        "skip_prep": hparams["skip_prep"],
    },
)

Internal Processing Details

Segment Extraction

For each wav file, the function:

  1. Loads the audio signal to determine its duration
  2. Computes the number of non-overlapping chunks: num_chunks = int(duration / seg_dur)
  3. Creates a CSV entry for each chunk with computed start/stop sample indices
  4. Filters out chunks whose average amplitude falls below amp_th

Train/Dev Splitting

Speakers present in the verification pairs file are excluded from both train and dev. The remaining utterances are split according to split_ratio:

  • Utterance-level (default): Utterances are randomly shuffled and split by count.
  • Speaker-level (split_speaker=True): Speaker lists are shuffled and split, then all utterances from each speaker go to the assigned partition.

Skip Detection

The function saves a pickle file (opt_voxceleb_prepare.pkl) with the preparation configuration. On subsequent calls, if the configuration matches and all output CSV files exist, preparation is skipped automatically.

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment