Principle:Facebookresearch Audiocraft Multi Scale Discrimination
| Knowledge Sources | |
|---|---|
| Domains | Audio_Synthesis, GAN |
| Last Updated | 2026-02-14 01:00 GMT |
Overview
An adversarial discrimination technique that evaluates audio quality at multiple temporal resolutions to capture both fine-grained and coarse temporal patterns.
Description
Multi-Scale Discrimination applies progressive downsampling to create different temporal views of the audio signal. Each scale is evaluated by its own 1D convolutional discriminator, capturing patterns from sample-level details to longer-range structure. The multi-scale design prevents the discriminator from focusing only on one temporal resolution.
Usage
Use this principle in adversarial training for audio synthesis or compression when you need a discriminator that captures temporal patterns at multiple time scales.
Theoretical Basis
Given audio x, create scale views via average pooling with stride s:
Pseudo-code:
# Abstract multi-scale discrimination (NOT actual implementation)
views = [x]
for scale in range(1, n_scales):
views.append(avg_pool1d(x, kernel_size=s**scale, stride=s**scale))
scores = [discriminator_i(view) for view, discriminator_i in zip(views, discriminators)]