Implementation:Speechbrain Speechbrain Prepare TIMIT
| Knowledge Sources | |
|---|---|
| Domains | ASR, Data_Preparation |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for preparing the TIMIT dataset (LDC93S1) for phoneme recognition and ASR tasks provided by the SpeechBrain library.
Description
This script prepares JSON manifest files for the TIMIT dataset, a standard benchmark for phoneme recognition. It processes the TIMIT directory structure to extract audio paths and phoneme-level transcriptions, supporting three phoneme set sizes (60, 48, or 39 phonemes). The script handles both upper-case and lower-case versions of the TIMIT corpus and produces train, validation, and test JSON manifests at a 16kHz sample rate.
Usage
Use this script when preparing data for phoneme recognition or ASR experiments on the TIMIT corpus. It must be executed before running any TIMIT-based training recipe in SpeechBrain.
Code Reference
Source Location
- Repository: SpeechBrain
- File: recipes/TIMIT/timit_prepare.py
Signature
def prepare_timit(
data_folder,
save_json_train,
save_json_valid,
save_json_test,
phn_set=39,
uppercase=False,
skip_prep=False,
):
Import
from timit_prepare import prepare_timit
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data_folder | str | Yes | Path to the folder where the original TIMIT dataset is stored |
| save_json_train | str | Yes | The path where the training JSON file will be stored |
| save_json_valid | str | Yes | The path where the validation JSON file will be stored |
| save_json_test | str | Yes | The path where the test JSON file will be stored |
| phn_set | int | No | The phoneme set to use: 60, 48, or 39 phonemes (default: 39) |
| uppercase | bool | No | Set to True when TIMIT dataset is in upper-case version (default: False) |
| skip_prep | bool | No | If True, skips data preparation (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| train.json | JSON file | Training set manifest with audio paths and phoneme transcriptions |
| valid.json | JSON file | Validation set manifest |
| test.json | JSON file | Test set manifest (core test set) |
Usage Examples
from timit_prepare import prepare_timit
prepare_timit(
data_folder="/path/to/TIMIT",
save_json_train="output/train.json",
save_json_valid="output/valid.json",
save_json_test="output/test.json",
phn_set=39,
)