Implementation:Facebookresearch Audiocraft MultiBandDiffusion
| Knowledge Sources | |
|---|---|
| Domains | Audio_Generation, Diffusion |
| Last Updated | 2026-02-14 01:00 GMT |
Overview
Concrete tool for converting discrete audio tokens to high-fidelity waveforms using multiple band-specific diffusion processes.
Description
MultiBandDiffusion orchestrates multiple DiffusionProcess instances (one per frequency band) alongside a shared codec model. Each diffusion process generates a waveform for its frequency band, and the outputs are summed. An EQ matching step aligns the spectral profile with the codec decoder output. It provides get_mbd_musicgen and get_mbd_24khz factory methods for loading pretrained models.
Usage
Import this class when you want to improve the audio quality of codec-generated audio by replacing the codec decoder with diffusion-based decoding.
Code Reference
Source Location
- Repository: Facebookresearch_Audiocraft
- File: audiocraft/models/multibanddiffusion.py
- Lines: 1-191
Signature
class MultiBandDiffusion:
def __init__(self, DPs: tp.List[DiffusionProcess], codec_model: CompressionModel): ...
@staticmethod
def get_mbd_musicgen(device=None): ...
@staticmethod
def get_mbd_24khz(bw=3.0, device=None, n_q=None): ...
def tokens_to_wav(self, tokens: torch.Tensor, n_bands: int = 32) -> torch.Tensor: ...
def regenerate(self, wav: torch.Tensor, sample_rate: int) -> torch.Tensor: ...
def generate(self, emb: torch.Tensor, size=None, step_list=None) -> torch.Tensor: ...
Import
from audiocraft.models import MultiBandDiffusion
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| tokens | torch.Tensor | Yes | Discrete codec tokens [B, K, T] (for tokens_to_wav) |
| wav | torch.Tensor | No | Audio waveform (for regenerate) |
| emb | torch.Tensor | No | Codec latent embeddings (for generate) |
Outputs
| Name | Type | Description |
|---|---|---|
| wav | torch.Tensor | Generated/regenerated high-fidelity audio [B, C, T] |
Usage Examples
from audiocraft.models import MultiBandDiffusion, MusicGen
# Load MBD for MusicGen
mbd = MultiBandDiffusion.get_mbd_musicgen()
# Generate tokens with MusicGen, then decode with MBD
musicgen = MusicGen.get_pretrained('facebook/musicgen-small')
tokens = musicgen.generate(['ambient piano music'], return_tokens=True)
wav = mbd.tokens_to_wav(tokens)