Principle:Facebookresearch Audiocraft Masked Parallel Token Generation

Knowledge Sources	MAGNeT Facebookresearch_Audiocraft
Domains	Audio_Generation, Masked_Modeling
Last Updated	2026-02-14 01:00 GMT

Overview

A non-autoregressive generation strategy that iteratively unmasks discrete audio tokens in parallel using a cosine schedule, enabling faster inference than sequential autoregressive decoding.

Description

Masked Parallel Token Generation (MAGNeT) replaces traditional left-to-right autoregressive decoding with a parallel iterative approach. Starting from a fully masked sequence, the model predicts all tokens simultaneously, then retains the most confident predictions and re-masks the rest. This process repeats for a configurable number of steps, with the masking ratio following a cosine annealing schedule. Each codebook level in the residual vector quantizer is decoded independently, allowing the model to capture both coarse and fine audio structure.

Usage

Use this principle when designing or understanding non-autoregressive audio generation models that need faster inference than autoregressive approaches. It is the core generation algorithm behind MAGNeT models for text-to-music and text-to-sound generation.

Theoretical Basis

The masking schedule follows a cosine function:

$γ (t) = \cos (\frac{π t}{2})$

At each decoding step t in [0, T], the fraction of tokens remaining masked is approximately gamma(t). Tokens are scored by their prediction confidence, and the most confident fraction (1 - gamma) is revealed.

Pseudo-code:

# Abstract decoding algorithm (NOT actual implementation)
tokens = MASK * ones(sequence_length)
for step in range(num_steps):
    logits = model(tokens, conditions)
    predictions = sample(logits)
    confidence = max(softmax(logits))
    mask_ratio = cos(pi * step / (2 * num_steps))
    num_to_mask = int(mask_ratio * sequence_length)
    least_confident = argsort(confidence)[:num_to_mask]
    tokens = predictions
    tokens[least_confident] = MASK

Related Pages

Implementation:Facebookresearch_Audiocraft_MagnetLMModel

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment