Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Facebookresearch Audiocraft SoundDataset

From Leeroopedia
Revision as of 12:33, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Facebookresearch_Audiocraft_SoundDataset.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Audio_Data, Sound_Generation
Last Updated 2026-02-14 01:00 GMT

Overview

Concrete tool for loading and augmenting environmental sound datasets with text descriptions and audio mixing provided by the AudioCraft library.

Description

SoundDataset extends InfoAudioDataset to provide a dataset for environmental/general sound generation. It supports audio mixing augmentation where pairs of audio samples are blended with configurable SNR, enabling data augmentation during training. The dataset returns SoundInfo segments containing text descriptions and optional self-conditioning waveforms.

Usage

Import this class when you need to load environmental sound datasets for training AudioGen or similar sound generation models. It is the standard dataset class used by the AudioGen training pipeline and supports pairing-based mixing augmentation for improved generalization.

Code Reference

Source Location

Signature

class SoundDataset(InfoAudioDataset):
    def __init__(self, *args, **kwargs):
        """
        Reads pairing_list from cfg.datasource, sets up mix_p, mix_snr_low,
        mix_snr_high, mix_min_overlap for audio mixing augmentation.
        """

    def __getitem__(self, index: int):
        """Returns (wav, SoundInfo) with optional mixing augmentation."""

Import

from audiocraft.data.sound_dataset import SoundDataset, SoundInfo

I/O Contract

Inputs

Name Type Required Description
index int Yes Index into the dataset
cfg.datasource.pairing_list str No Path to pairing list for mixing augmentation
cfg.datasource.mix_p float No Probability of applying mix augmentation (default 0)
cfg.datasource.mix_snr_low float No Lower bound of mixing SNR in dB (default -5)
cfg.datasource.mix_snr_high float No Upper bound of mixing SNR in dB (default 5)

Outputs

Name Type Description
wav torch.Tensor Audio waveform tensor [C, T]
info SoundInfo Metadata segment with description and self_wav fields

Usage Examples

Basic Dataset Loading

from audiocraft.data.sound_dataset import SoundDataset

# SoundDataset is typically instantiated via Hydra config
# through the solver's build_dataloaders method.
# Direct usage:
dataset = SoundDataset(
    meta=audio_meta_list,
    segment_duration=10.0,
    sample_rate=16000,
)

wav, info = dataset[0]
print(info.description)  # Text description of the sound
print(wav.shape)          # [1, 160000] for mono 16kHz, 10s

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment