Implementation:Huggingface Datasets Audio
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, NLP |
| Last Updated | 2026-02-14 18:00 GMT |
Overview
Concrete tool for handling audio data with decoding and resampling support provided by the HuggingFace Datasets library.
Description
Audio is a dataclass feature type for audio data. It accepts multiple input formats: file paths (str or pathlib.Path), dictionaries with "path"/"bytes" keys, dictionaries with "array"/"sampling_rate" keys, or torchcodec.decoders.AudioDecoder objects. Audio data is stored in Arrow as a struct with bytes (binary) and path (string) fields. When decoded (default), accessing audio data returns AudioDecoder objects providing array data and sampling rate. Optional parameters allow target resampling rate, channel count specification (mono/stereo), and stream index selection.
Usage
Use Audio as a feature type for any column containing audio data. Specify sampling rate and channel count for automatic resampling and conversion during decoding.
Code Reference
Source Location
- Repository: datasets
- File:
src/datasets/features/audio.py - Lines: 24-320
Signature
@dataclass
class Audio:
sampling_rate: Optional[int] = None
decode: bool = True
num_channels: Optional[int] = None
stream_index: Optional[int] = None
id: Optional[str] = field(default=None, repr=False)
# Automatically constructed
dtype: ClassVar[str] = "dict"
pa_type: ClassVar[Any] = pa.struct({"bytes": pa.binary(), "path": pa.string()})
_type: str = field(default="Audio", init=False, repr=False)
Import
from datasets import Audio
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| sampling_rate | int |
No | Target sampling rate for resampling. None uses native rate. |
| decode | bool |
No | Whether to decode audio on access. Defaults to True. |
| num_channels | int |
No | Desired number of channels (None, 1 for mono, 2 for stereo). |
| stream_index | int |
No | Streaming index to use from the file. None defaults to "best". |
| id | str |
No | Optional feature identifier. |
Outputs
| Name | Type | Description |
|---|---|---|
| instance | Audio |
An Audio feature type for use in Features schemas. |
Usage Examples
Basic Usage
from datasets import load_dataset, Audio
ds = load_dataset("PolyAI/minds14", name="en-US", split="train")
# Resample to 44100 Hz stereo
ds = ds.cast_column("audio", Audio(sampling_rate=44100, num_channels=2))
# Access audio data
audio = ds[0]["audio"]
# <datasets.features._torchcodec.AudioDecoder object>