Implementation:Facebookresearch Audiocraft SoundDataset
| Knowledge Sources | |
|---|---|
| Domains | Audio_Data, Sound_Generation |
| Last Updated | 2026-02-14 01:00 GMT |
Overview
Concrete tool for loading and augmenting environmental sound datasets with text descriptions and audio mixing provided by the AudioCraft library.
Description
SoundDataset extends InfoAudioDataset to provide a dataset for environmental/general sound generation. It supports audio mixing augmentation where pairs of audio samples are blended with configurable SNR, enabling data augmentation during training. The dataset returns SoundInfo segments containing text descriptions and optional self-conditioning waveforms.
Usage
Import this class when you need to load environmental sound datasets for training AudioGen or similar sound generation models. It is the standard dataset class used by the AudioGen training pipeline and supports pairing-based mixing augmentation for improved generalization.
Code Reference
Source Location
- Repository: Facebookresearch_Audiocraft
- File: audiocraft/data/sound_dataset.py
- Lines: 1-330
Signature
class SoundDataset(InfoAudioDataset):
def __init__(self, *args, **kwargs):
"""
Reads pairing_list from cfg.datasource, sets up mix_p, mix_snr_low,
mix_snr_high, mix_min_overlap for audio mixing augmentation.
"""
def __getitem__(self, index: int):
"""Returns (wav, SoundInfo) with optional mixing augmentation."""
Import
from audiocraft.data.sound_dataset import SoundDataset, SoundInfo
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| index | int | Yes | Index into the dataset |
| cfg.datasource.pairing_list | str | No | Path to pairing list for mixing augmentation |
| cfg.datasource.mix_p | float | No | Probability of applying mix augmentation (default 0) |
| cfg.datasource.mix_snr_low | float | No | Lower bound of mixing SNR in dB (default -5) |
| cfg.datasource.mix_snr_high | float | No | Upper bound of mixing SNR in dB (default 5) |
Outputs
| Name | Type | Description |
|---|---|---|
| wav | torch.Tensor | Audio waveform tensor [C, T] |
| info | SoundInfo | Metadata segment with description and self_wav fields |
Usage Examples
Basic Dataset Loading
from audiocraft.data.sound_dataset import SoundDataset
# SoundDataset is typically instantiated via Hydra config
# through the solver's build_dataloaders method.
# Direct usage:
dataset = SoundDataset(
meta=audio_meta_list,
segment_duration=10.0,
sample_rate=16000,
)
wav, info = dataset[0]
print(info.description) # Text description of the sound
print(wav.shape) # [1, 160000] for mono 16kHz, 10s