Principle:Huggingface Datasets AudioFolder Dataset Building
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
AudioFolder Dataset Building is the process of constructing HuggingFace datasets from directories of audio files, where the folder structure and optional metadata files define the dataset's labels and additional features.
Description
The AudioFolder builder is a FolderBasedBuilder that scans a directory tree for audio files and constructs a dataset with automatic label inference from directory names. When audio files are organized into subdirectories (e.g., train/speech/001.wav, train/music/002.mp3), the builder treats each subdirectory name as a class label, producing a dataset with an audio column and a label column. This convention-over-configuration approach makes it straightforward to load audio classification datasets without writing any custom loading scripts.
Beyond directory-based label inference, the AudioFolder builder supports explicit metadata files. Users can place a metadata.csv or metadata.jsonl file alongside the audio files to specify arbitrary per-example attributes such as transcriptions, speaker IDs, duration, or multi-label annotations. When a metadata file is present, the builder merges its columns with the audio file paths, overriding or supplementing the directory-inferred labels.
The builder supports a wide range of audio formats including WAV, MP3, FLAC, OGG, and other formats handled by libraries such as soundfile and librosa. It automatically assigns the Audio feature type to the audio column, enabling downstream lazy decoding, resampling, and format conversion through the HuggingFace Datasets audio processing pipeline.
Usage
Apply AudioFolder Dataset Building when:
- Loading a local collection of audio files organized into labeled subdirectories for audio classification tasks.
- Building audio datasets with per-file metadata provided via CSV or JSONL sidecar files.
- Working with audio formats such as WAV, MP3, FLAC, or OGG that need automatic
Audiofeature detection. - Creating audio datasets without writing a custom dataset loading script by relying on folder structure conventions.
Theoretical Basis
The AudioFolder builder extends the FolderBasedBuilder base class, which implements the convention-over-configuration pattern for media datasets. The builder walks the directory tree, filters files by extension against a set of known audio extensions, and maps each file to an example dictionary. Label inference follows a simple rule: if audio files are nested one level below the split directory, the parent directory name becomes the label.
Metadata integration follows a join-like operation: the builder reads the metadata file into a lookup table keyed by file path, then merges each audio file's entry with its corresponding metadata row. This design separates the storage layout (files on disk) from the annotation layer (metadata files), allowing the same audio files to be loaded with different annotations by swapping the metadata file.