Principle:Facebookresearch Audiocraft Sound Dataset Augmented Loading

Knowledge Sources	Facebookresearch_Audiocraft AudioGen
Domains	Audio_Data, Data_Augmentation
Last Updated	2026-02-14 01:00 GMT

Overview

A data loading strategy that augments environmental sound datasets by mixing pairs of audio samples at random signal-to-noise ratios to improve sound generation model robustness.

Description

Sound Dataset Augmented Loading is a data pipeline technique used in audio generation training. Rather than training on isolated sound clips, this approach pairs audio samples and mixes them at configurable SNR levels. The mixed audio is presented alongside its text description, helping the model learn to generate sounds that are robust to background noise and overlapping audio events. The pairing is controlled via a pre-computed pairing list that maps each sample to a mixing partner.

Usage

Use this principle when training sound generation models (e.g., AudioGen) on environmental sound datasets where data augmentation through audio mixing can improve the diversity and robustness of generated outputs.

Theoretical Basis

The core idea is SNR-controlled mixing of two audio signals:

$y_{m i x e d} = x_{p r i m a r y} + 1 0^{- S N R / 20} \cdot x_{s e c o n d a r y}$

Where SNR is sampled uniformly from [SNR_{low}, SNR_{high}] in decibels. A minimum overlap constraint ensures the secondary signal temporally overlaps the primary. The mixing probability p controls how often augmentation is applied versus returning the original signal.

Related Pages

Implementation:Facebookresearch_Audiocraft_SoundDataset

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment