Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Datasets Audio

From Leeroopedia
Revision as of 12:58, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Huggingface_Datasets_Audio.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Data_Engineering, NLP
Last Updated 2026-02-14 18:00 GMT

Overview

Concrete tool for handling audio data with decoding and resampling support provided by the HuggingFace Datasets library.

Description

Audio is a dataclass feature type for audio data. It accepts multiple input formats: file paths (str or pathlib.Path), dictionaries with "path"/"bytes" keys, dictionaries with "array"/"sampling_rate" keys, or torchcodec.decoders.AudioDecoder objects. Audio data is stored in Arrow as a struct with bytes (binary) and path (string) fields. When decoded (default), accessing audio data returns AudioDecoder objects providing array data and sampling rate. Optional parameters allow target resampling rate, channel count specification (mono/stereo), and stream index selection.

Usage

Use Audio as a feature type for any column containing audio data. Specify sampling rate and channel count for automatic resampling and conversion during decoding.

Code Reference

Source Location

  • Repository: datasets
  • File: src/datasets/features/audio.py
  • Lines: 24-320

Signature

@dataclass
class Audio:
    sampling_rate: Optional[int] = None
    decode: bool = True
    num_channels: Optional[int] = None
    stream_index: Optional[int] = None
    id: Optional[str] = field(default=None, repr=False)
    # Automatically constructed
    dtype: ClassVar[str] = "dict"
    pa_type: ClassVar[Any] = pa.struct({"bytes": pa.binary(), "path": pa.string()})
    _type: str = field(default="Audio", init=False, repr=False)

Import

from datasets import Audio

I/O Contract

Inputs

Name Type Required Description
sampling_rate int No Target sampling rate for resampling. None uses native rate.
decode bool No Whether to decode audio on access. Defaults to True.
num_channels int No Desired number of channels (None, 1 for mono, 2 for stereo).
stream_index int No Streaming index to use from the file. None defaults to "best".
id str No Optional feature identifier.

Outputs

Name Type Description
instance Audio An Audio feature type for use in Features schemas.

Usage Examples

Basic Usage

from datasets import load_dataset, Audio

ds = load_dataset("PolyAI/minds14", name="en-US", split="train")

# Resample to 44100 Hz stereo
ds = ds.cast_column("audio", Audio(sampling_rate=44100, num_channels=2))

# Access audio data
audio = ds[0]["audio"]
# <datasets.features._torchcodec.AudioDecoder object>

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment