Implementation:Facebookresearch Audiocraft SoundDataset

Knowledge Sources	Facebookresearch_Audiocraft
Domains	Audio_Data, Sound_Generation
Last Updated	2026-02-14 01:00 GMT

Overview

Concrete tool for loading and augmenting environmental sound datasets with text descriptions and audio mixing provided by the AudioCraft library.

Description

SoundDataset extends InfoAudioDataset to provide a dataset for environmental/general sound generation. It supports audio mixing augmentation where pairs of audio samples are blended with configurable SNR, enabling data augmentation during training. The dataset returns SoundInfo segments containing text descriptions and optional self-conditioning waveforms.

Usage

Import this class when you need to load environmental sound datasets for training AudioGen or similar sound generation models. It is the standard dataset class used by the AudioGen training pipeline and supports pairing-based mixing augmentation for improved generalization.

Code Reference

Source Location

Repository: Facebookresearch_Audiocraft
File: audiocraft/data/sound_dataset.py
Lines: 1-330

Signature

class SoundDataset(InfoAudioDataset):
    def __init__(self, *args, **kwargs):
        """
        Reads pairing_list from cfg.datasource, sets up mix_p, mix_snr_low,
        mix_snr_high, mix_min_overlap for audio mixing augmentation.
        """

    def __getitem__(self, index: int):
        """Returns (wav, SoundInfo) with optional mixing augmentation."""

Import

from audiocraft.data.sound_dataset import SoundDataset, SoundInfo

I/O Contract

Inputs

Name	Type	Required	Description
index	int	Yes	Index into the dataset
cfg.datasource.pairing_list	str	No	Path to pairing list for mixing augmentation
cfg.datasource.mix_p	float	No	Probability of applying mix augmentation (default 0)
cfg.datasource.mix_snr_low	float	No	Lower bound of mixing SNR in dB (default -5)
cfg.datasource.mix_snr_high	float	No	Upper bound of mixing SNR in dB (default 5)

Outputs

Name	Type	Description
wav	torch.Tensor	Audio waveform tensor [C, T]
info	SoundInfo	Metadata segment with description and self_wav fields

Usage Examples

Basic Dataset Loading

from audiocraft.data.sound_dataset import SoundDataset

# SoundDataset is typically instantiated via Hydra config
# through the solver's build_dataloaders method.
# Direct usage:
dataset = SoundDataset(
    meta=audio_meta_list,
    segment_duration=10.0,
    sample_rate=16000,
)

wav, info = dataset[0]
print(info.description)  # Text description of the sound
print(wav.shape)          # [1, 160000] for mono 16kHz, 10s

Related Pages

Principle:Facebookresearch_Audiocraft_Sound_Dataset_Augmented_Loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment