Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Speechbrain Speechbrain Prepare Libritts

From Leeroopedia


Property Value
Type API Doc
Repository speechbrain/speechbrain
Source File recipes/LibriTTS/libritts_prepare.py:L27-150
Import from libritts_prepare import prepare_libritts
Related Principle Principle:Speechbrain_Speechbrain_LibriTTS_Data_Preparation

API Signature

def prepare_libritts(
    data_folder,
    save_json_train,
    save_json_valid,
    save_json_test,
    sample_rate,
    split_ratio=[80, 10, 10],
    libritts_subsets=None,
    train_split=None,
    valid_split=None,
    test_split=None,
    seed=1234,
    model_name=None,
    skip_prep=False,
)

Description

Prepares JSON manifest files for the LibriTTS dataset, suitable for TTS model training. Scans the LibriTTS directory structure for .wav files and their corresponding .normalized.txt transcription files, then creates train/valid/test JSON manifests with metadata including file paths, durations, speaker IDs, and text labels.

Parameters

Parameter Type Default Description
data_folder str required Path to the root folder where LibriTTS data is stored (e.g., /data/LibriTTS)
save_json_train str required Output path for the training manifest JSON file
save_json_valid str required Output path for the validation manifest JSON file
save_json_test str required Output path for the test manifest JSON file
sample_rate int required Target sample rate in Hz. Audio files are resampled in-place if they do not match
split_ratio list [80, 10, 10] Train/valid/test ratio when using libritts_subsets random split mode
libritts_subsets list or None None LibriTTS subset names to combine and randomly split (e.g., ["train-clean-100"])
train_split list or None None Explicit subset names for the training split (e.g., ["train-clean-100", "train-clean-360"])
valid_split list or None None Explicit subset names for the validation split (e.g., ["dev-clean"])
test_split list or None None Explicit subset names for the test split (e.g., ["test-clean"])
seed int 1234 Random seed for reproducible splitting
model_name str or None None Model name that controls preprocessing. "Tacotron2" computes phonemes, "HiFi-GAN" skips phoneme computation
skip_prep bool False If True, skip the entire preparation step

Returns

None. The function writes JSON files to the paths specified by save_json_train, save_json_valid, and save_json_test.

Output Format

Each JSON manifest is a dictionary mapping utterance IDs to metadata:

{
  "116_288045_000003_000002": {
    "uttid": "116_288045_000003_000002",
    "wav": "/data/LibriTTS/train-clean-100/116/288045/116_288045_000003_000002.wav",
    "duration": 3.45,
    "spk_id": "116",
    "label": "The normalized transcription text.",
    "segment": true
  }
}

Fields:

  • uttid: Unique utterance identifier derived from the filename
  • wav: Absolute path to the audio file
  • duration: Audio duration in seconds
  • spk_id: Speaker identifier extracted from the utterance ID (first component before underscore)
  • label: Normalized text transcription from the .normalized.txt file
  • segment: Boolean; true for training data (enables random segment cropping in vocoder training)

Usage Examples

Explicit Split Mode (Tacotron2 Training)

from libritts_prepare import prepare_libritts

prepare_libritts(
    data_folder="/data/LibriTTS",
    save_json_train="results/save/train.json",
    save_json_valid="results/save/valid.json",
    save_json_test="results/save/test.json",
    sample_rate=16000,
    train_split=["train-clean-100"],
    valid_split=["dev-clean"],
    test_split=["test-clean"],
    seed=1234,
    model_name="Tacotron2",
)

Random Split Mode (HiFi-GAN Training)

from libritts_prepare import prepare_libritts

prepare_libritts(
    data_folder="/data/LibriTTS",
    save_json_train="results/save/train.json",
    save_json_valid="results/save/valid.json",
    save_json_test="results/save/test.json",
    sample_rate=16000,
    split_ratio=[90, 10, 0],
    libritts_subsets=["train-clean-100", "train-clean-360"],
    model_name="HiFi-GAN",
)

Integration in Training Recipe

import speechbrain as sb
from libritts_prepare import prepare_libritts

# Called via distributed utility to run only on main process
sb.utils.distributed.run_on_main(
    prepare_libritts,
    kwargs={
        "data_folder": hparams["data_folder"],
        "save_json_train": hparams["train_json"],
        "save_json_valid": hparams["valid_json"],
        "save_json_test": hparams["test_json"],
        "sample_rate": hparams["sample_rate"],
        "train_split": hparams["train_split"],
        "valid_split": hparams["valid_split"],
        "test_split": hparams["test_split"],
        "seed": hparams["seed"],
        "model_name": hparams["model"].__class__.__name__,
    },
)

Internal Functions

prepare_split

def prepare_split(data_folder, split_list):

Collects all .wav files from the specified LibriTTS subsets. Iterates through each subset directory and uses get_all_files() to recursively find audio files.

create_json

def create_json(wav_list, json_file, sample_rate, model_name=None):

Processes each WAV file to build the JSON manifest. For each file:

  1. Loads the audio and computes duration
  2. Filters out utterances shorter than 1.0 second
  3. Reads the corresponding .normalized.txt file
  4. Resamples audio in-place if sample rate does not match target
  5. Extracts speaker ID from the utterance ID
  6. Optionally computes phoneme labels (for non-Tacotron2, non-HiFi-GAN models)

split_sets

def split_sets(wav_list, split_ratio):

Randomly shuffles the file list and partitions it according to the split ratio. Returns a dictionary with "train", "valid", and "test" keys.

Idempotency

The function checks for existing output files at the start via the skip() helper. If all three JSON files already exist, preparation is skipped entirely. This prevents unnecessary reprocessing when resuming training.

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment