Principle:Facebookresearch Audiocraft Sound Dataset Augmented Loading
| Knowledge Sources | |
|---|---|
| Domains | Audio_Data, Data_Augmentation |
| Last Updated | 2026-02-14 01:00 GMT |
Overview
A data loading strategy that augments environmental sound datasets by mixing pairs of audio samples at random signal-to-noise ratios to improve sound generation model robustness.
Description
Sound Dataset Augmented Loading is a data pipeline technique used in audio generation training. Rather than training on isolated sound clips, this approach pairs audio samples and mixes them at configurable SNR levels. The mixed audio is presented alongside its text description, helping the model learn to generate sounds that are robust to background noise and overlapping audio events. The pairing is controlled via a pre-computed pairing list that maps each sample to a mixing partner.
Usage
Use this principle when training sound generation models (e.g., AudioGen) on environmental sound datasets where data augmentation through audio mixing can improve the diversity and robustness of generated outputs.
Theoretical Basis
The core idea is SNR-controlled mixing of two audio signals:
Where SNR is sampled uniformly from [SNR_{low}, SNR_{high}] in decibels. A minimum overlap constraint ensures the secondary signal temporally overlaps the primary. The mixing probability p controls how often augmentation is applied versus returning the original signal.