Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Speechbrain Speechbrain Prepare TIMIT

From Leeroopedia


Knowledge Sources
Domains ASR, Data_Preparation
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for preparing the TIMIT dataset (LDC93S1) for phoneme recognition and ASR tasks provided by the SpeechBrain library.

Description

This script prepares JSON manifest files for the TIMIT dataset, a standard benchmark for phoneme recognition. It processes the TIMIT directory structure to extract audio paths and phoneme-level transcriptions, supporting three phoneme set sizes (60, 48, or 39 phonemes). The script handles both upper-case and lower-case versions of the TIMIT corpus and produces train, validation, and test JSON manifests at a 16kHz sample rate.

Usage

Use this script when preparing data for phoneme recognition or ASR experiments on the TIMIT corpus. It must be executed before running any TIMIT-based training recipe in SpeechBrain.

Code Reference

Source Location

Signature

def prepare_timit(
    data_folder,
    save_json_train,
    save_json_valid,
    save_json_test,
    phn_set=39,
    uppercase=False,
    skip_prep=False,
):

Import

from timit_prepare import prepare_timit

I/O Contract

Inputs

Name Type Required Description
data_folder str Yes Path to the folder where the original TIMIT dataset is stored
save_json_train str Yes The path where the training JSON file will be stored
save_json_valid str Yes The path where the validation JSON file will be stored
save_json_test str Yes The path where the test JSON file will be stored
phn_set int No The phoneme set to use: 60, 48, or 39 phonemes (default: 39)
uppercase bool No Set to True when TIMIT dataset is in upper-case version (default: False)
skip_prep bool No If True, skips data preparation (default: False)

Outputs

Name Type Description
train.json JSON file Training set manifest with audio paths and phoneme transcriptions
valid.json JSON file Validation set manifest
test.json JSON file Test set manifest (core test set)

Usage Examples

from timit_prepare import prepare_timit

prepare_timit(
    data_folder="/path/to/TIMIT",
    save_json_train="output/train.json",
    save_json_valid="output/valid.json",
    save_json_test="output/test.json",
    phn_set=39,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment