Implementation:Speechbrain Speechbrain Prepare Libritts

Property	Value
Type	API Doc
Repository	speechbrain/speechbrain
Source File	`recipes/LibriTTS/libritts_prepare.py:L27-150`
Import	`from libritts_prepare import prepare_libritts`
Related Principle	Principle:Speechbrain_Speechbrain_LibriTTS_Data_Preparation

API Signature

def prepare_libritts(
    data_folder,
    save_json_train,
    save_json_valid,
    save_json_test,
    sample_rate,
    split_ratio=[80, 10, 10],
    libritts_subsets=None,
    train_split=None,
    valid_split=None,
    test_split=None,
    seed=1234,
    model_name=None,
    skip_prep=False,
)

Description

Prepares JSON manifest files for the LibriTTS dataset, suitable for TTS model training. Scans the LibriTTS directory structure for .wav files and their corresponding .normalized.txt transcription files, then creates train/valid/test JSON manifests with metadata including file paths, durations, speaker IDs, and text labels.

Parameters

Parameter	Type	Default	Description
data_folder	str	required	Path to the root folder where LibriTTS data is stored (e.g., `/data/LibriTTS`)
save_json_train	str	required	Output path for the training manifest JSON file
save_json_valid	str	required	Output path for the validation manifest JSON file
save_json_test	str	required	Output path for the test manifest JSON file
sample_rate	int	required	Target sample rate in Hz. Audio files are resampled in-place if they do not match
split_ratio	list	[80, 10, 10]	Train/valid/test ratio when using `libritts_subsets` random split mode
libritts_subsets	list or None	None	LibriTTS subset names to combine and randomly split (e.g., `["train-clean-100"]`)
train_split	list or None	None	Explicit subset names for the training split (e.g., `["train-clean-100", "train-clean-360"]`)
valid_split	list or None	None	Explicit subset names for the validation split (e.g., `["dev-clean"]`)
test_split	list or None	None	Explicit subset names for the test split (e.g., `["test-clean"]`)
seed	int	1234	Random seed for reproducible splitting
model_name	str or None	None	Model name that controls preprocessing. `"Tacotron2"` computes phonemes, `"HiFi-GAN"` skips phoneme computation
skip_prep	bool	False	If True, skip the entire preparation step

Returns

None. The function writes JSON files to the paths specified by save_json_train, save_json_valid, and save_json_test.

Output Format

Each JSON manifest is a dictionary mapping utterance IDs to metadata:

{
  "116_288045_000003_000002": {
    "uttid": "116_288045_000003_000002",
    "wav": "/data/LibriTTS/train-clean-100/116/288045/116_288045_000003_000002.wav",
    "duration": 3.45,
    "spk_id": "116",
    "label": "The normalized transcription text.",
    "segment": true
  }
}

Fields:

uttid: Unique utterance identifier derived from the filename
wav: Absolute path to the audio file
duration: Audio duration in seconds
spk_id: Speaker identifier extracted from the utterance ID (first component before underscore)
label: Normalized text transcription from the .normalized.txt file
segment: Boolean; true for training data (enables random segment cropping in vocoder training)

Usage Examples

Explicit Split Mode (Tacotron2 Training)

from libritts_prepare import prepare_libritts

prepare_libritts(
    data_folder="/data/LibriTTS",
    save_json_train="results/save/train.json",
    save_json_valid="results/save/valid.json",
    save_json_test="results/save/test.json",
    sample_rate=16000,
    train_split=["train-clean-100"],
    valid_split=["dev-clean"],
    test_split=["test-clean"],
    seed=1234,
    model_name="Tacotron2",
)

Random Split Mode (HiFi-GAN Training)

from libritts_prepare import prepare_libritts

prepare_libritts(
    data_folder="/data/LibriTTS",
    save_json_train="results/save/train.json",
    save_json_valid="results/save/valid.json",
    save_json_test="results/save/test.json",
    sample_rate=16000,
    split_ratio=[90, 10, 0],
    libritts_subsets=["train-clean-100", "train-clean-360"],
    model_name="HiFi-GAN",
)

Integration in Training Recipe

import speechbrain as sb
from libritts_prepare import prepare_libritts

# Called via distributed utility to run only on main process
sb.utils.distributed.run_on_main(
    prepare_libritts,
    kwargs={
        "data_folder": hparams["data_folder"],
        "save_json_train": hparams["train_json"],
        "save_json_valid": hparams["valid_json"],
        "save_json_test": hparams["test_json"],
        "sample_rate": hparams["sample_rate"],
        "train_split": hparams["train_split"],
        "valid_split": hparams["valid_split"],
        "test_split": hparams["test_split"],
        "seed": hparams["seed"],
        "model_name": hparams["model"].__class__.__name__,
    },
)

Internal Functions

prepare_split

def prepare_split(data_folder, split_list):

Collects all .wav files from the specified LibriTTS subsets. Iterates through each subset directory and uses get_all_files() to recursively find audio files.

create_json

def create_json(wav_list, json_file, sample_rate, model_name=None):

Processes each WAV file to build the JSON manifest. For each file:

Loads the audio and computes duration
Filters out utterances shorter than 1.0 second
Reads the corresponding .normalized.txt file
Resamples audio in-place if sample rate does not match target
Extracts speaker ID from the utterance ID
Optionally computes phoneme labels (for non-Tacotron2, non-HiFi-GAN models)

split_sets

def split_sets(wav_list, split_ratio):

Randomly shuffles the file list and partitions it according to the split ratio. Returns a dictionary with "train", "valid", and "test" keys.

Idempotency

The function checks for existing output files at the start via the skip() helper. If all three JSON files already exist, preparation is skipped entirely. This prevents unnecessary reprocessing when resuming training.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment