Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Facebookresearch Audiocraft MultiBandDiffusion

From Leeroopedia
Knowledge Sources
Domains Audio_Generation, Diffusion
Last Updated 2026-02-14 01:00 GMT

Overview

Concrete tool for converting discrete audio tokens to high-fidelity waveforms using multiple band-specific diffusion processes.

Description

MultiBandDiffusion orchestrates multiple DiffusionProcess instances (one per frequency band) alongside a shared codec model. Each diffusion process generates a waveform for its frequency band, and the outputs are summed. An EQ matching step aligns the spectral profile with the codec decoder output. It provides get_mbd_musicgen and get_mbd_24khz factory methods for loading pretrained models.

Usage

Import this class when you want to improve the audio quality of codec-generated audio by replacing the codec decoder with diffusion-based decoding.

Code Reference

Source Location

Signature

class MultiBandDiffusion:
    def __init__(self, DPs: tp.List[DiffusionProcess], codec_model: CompressionModel): ...

    @staticmethod
    def get_mbd_musicgen(device=None): ...

    @staticmethod
    def get_mbd_24khz(bw=3.0, device=None, n_q=None): ...

    def tokens_to_wav(self, tokens: torch.Tensor, n_bands: int = 32) -> torch.Tensor: ...
    def regenerate(self, wav: torch.Tensor, sample_rate: int) -> torch.Tensor: ...
    def generate(self, emb: torch.Tensor, size=None, step_list=None) -> torch.Tensor: ...

Import

from audiocraft.models import MultiBandDiffusion

I/O Contract

Inputs

Name Type Required Description
tokens torch.Tensor Yes Discrete codec tokens [B, K, T] (for tokens_to_wav)
wav torch.Tensor No Audio waveform (for regenerate)
emb torch.Tensor No Codec latent embeddings (for generate)

Outputs

Name Type Description
wav torch.Tensor Generated/regenerated high-fidelity audio [B, C, T]

Usage Examples

from audiocraft.models import MultiBandDiffusion, MusicGen

# Load MBD for MusicGen
mbd = MultiBandDiffusion.get_mbd_musicgen()

# Generate tokens with MusicGen, then decode with MBD
musicgen = MusicGen.get_pretrained('facebook/musicgen-small')
tokens = musicgen.generate(['ambient piano music'], return_tokens=True)
wav = mbd.tokens_to_wav(tokens)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment