Implementation:Speechbrain Speechbrain Prepare Libritts
| Property | Value |
|---|---|
| Type | API Doc |
| Repository | speechbrain/speechbrain |
| Source File | recipes/LibriTTS/libritts_prepare.py:L27-150
|
| Import | from libritts_prepare import prepare_libritts
|
| Related Principle | Principle:Speechbrain_Speechbrain_LibriTTS_Data_Preparation |
API Signature
def prepare_libritts(
data_folder,
save_json_train,
save_json_valid,
save_json_test,
sample_rate,
split_ratio=[80, 10, 10],
libritts_subsets=None,
train_split=None,
valid_split=None,
test_split=None,
seed=1234,
model_name=None,
skip_prep=False,
)
Description
Prepares JSON manifest files for the LibriTTS dataset, suitable for TTS model training. Scans the LibriTTS directory structure for .wav files and their corresponding .normalized.txt transcription files, then creates train/valid/test JSON manifests with metadata including file paths, durations, speaker IDs, and text labels.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| data_folder | str | required | Path to the root folder where LibriTTS data is stored (e.g., /data/LibriTTS)
|
| save_json_train | str | required | Output path for the training manifest JSON file |
| save_json_valid | str | required | Output path for the validation manifest JSON file |
| save_json_test | str | required | Output path for the test manifest JSON file |
| sample_rate | int | required | Target sample rate in Hz. Audio files are resampled in-place if they do not match |
| split_ratio | list | [80, 10, 10] | Train/valid/test ratio when using libritts_subsets random split mode
|
| libritts_subsets | list or None | None | LibriTTS subset names to combine and randomly split (e.g., ["train-clean-100"])
|
| train_split | list or None | None | Explicit subset names for the training split (e.g., ["train-clean-100", "train-clean-360"])
|
| valid_split | list or None | None | Explicit subset names for the validation split (e.g., ["dev-clean"])
|
| test_split | list or None | None | Explicit subset names for the test split (e.g., ["test-clean"])
|
| seed | int | 1234 | Random seed for reproducible splitting |
| model_name | str or None | None | Model name that controls preprocessing. "Tacotron2" computes phonemes, "HiFi-GAN" skips phoneme computation
|
| skip_prep | bool | False | If True, skip the entire preparation step |
Returns
None. The function writes JSON files to the paths specified by save_json_train, save_json_valid, and save_json_test.
Output Format
Each JSON manifest is a dictionary mapping utterance IDs to metadata:
{
"116_288045_000003_000002": {
"uttid": "116_288045_000003_000002",
"wav": "/data/LibriTTS/train-clean-100/116/288045/116_288045_000003_000002.wav",
"duration": 3.45,
"spk_id": "116",
"label": "The normalized transcription text.",
"segment": true
}
}
Fields:
- uttid: Unique utterance identifier derived from the filename
- wav: Absolute path to the audio file
- duration: Audio duration in seconds
- spk_id: Speaker identifier extracted from the utterance ID (first component before underscore)
- label: Normalized text transcription from the
.normalized.txtfile - segment: Boolean;
truefor training data (enables random segment cropping in vocoder training)
Usage Examples
Explicit Split Mode (Tacotron2 Training)
from libritts_prepare import prepare_libritts
prepare_libritts(
data_folder="/data/LibriTTS",
save_json_train="results/save/train.json",
save_json_valid="results/save/valid.json",
save_json_test="results/save/test.json",
sample_rate=16000,
train_split=["train-clean-100"],
valid_split=["dev-clean"],
test_split=["test-clean"],
seed=1234,
model_name="Tacotron2",
)
Random Split Mode (HiFi-GAN Training)
from libritts_prepare import prepare_libritts
prepare_libritts(
data_folder="/data/LibriTTS",
save_json_train="results/save/train.json",
save_json_valid="results/save/valid.json",
save_json_test="results/save/test.json",
sample_rate=16000,
split_ratio=[90, 10, 0],
libritts_subsets=["train-clean-100", "train-clean-360"],
model_name="HiFi-GAN",
)
Integration in Training Recipe
import speechbrain as sb
from libritts_prepare import prepare_libritts
# Called via distributed utility to run only on main process
sb.utils.distributed.run_on_main(
prepare_libritts,
kwargs={
"data_folder": hparams["data_folder"],
"save_json_train": hparams["train_json"],
"save_json_valid": hparams["valid_json"],
"save_json_test": hparams["test_json"],
"sample_rate": hparams["sample_rate"],
"train_split": hparams["train_split"],
"valid_split": hparams["valid_split"],
"test_split": hparams["test_split"],
"seed": hparams["seed"],
"model_name": hparams["model"].__class__.__name__,
},
)
Internal Functions
prepare_split
def prepare_split(data_folder, split_list):
Collects all .wav files from the specified LibriTTS subsets. Iterates through each subset directory and uses get_all_files() to recursively find audio files.
create_json
def create_json(wav_list, json_file, sample_rate, model_name=None):
Processes each WAV file to build the JSON manifest. For each file:
- Loads the audio and computes duration
- Filters out utterances shorter than 1.0 second
- Reads the corresponding
.normalized.txtfile - Resamples audio in-place if sample rate does not match target
- Extracts speaker ID from the utterance ID
- Optionally computes phoneme labels (for non-Tacotron2, non-HiFi-GAN models)
split_sets
def split_sets(wav_list, split_ratio):
Randomly shuffles the file list and partitions it according to the split ratio. Returns a dictionary with "train", "valid", and "test" keys.
Idempotency
The function checks for existing output files at the start via the skip() helper. If all three JSON files already exist, preparation is skipped entirely. This prevents unnecessary reprocessing when resuming training.
See Also
- Principle:Speechbrain_Speechbrain_LibriTTS_Data_Preparation - Theoretical foundations of TTS data preparation
- Implementation:Speechbrain_Speechbrain_EncoderClassifier_Encode_Batch - Speaker embeddings computed after data preparation