Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Facebookresearch Audiocraft JASCO Generation Configuration

From Leeroopedia

Overview

JASCO Generation Configuration governs how multi-source classifier-free guidance (CFG) is parameterized for JASCO's flow matching generation process. Unlike MusicGen's single CFG coefficient, JASCO decomposes guidance into multiple terms, each controlling the influence of different conditioning sources (all conditions together, text only, or no conditions). This multi-source CFG approach enables fine-grained control over how strongly each type of conditioning (text, chords, drums, melody) affects the generated music.

Theoretical Background

Standard Classifier-Free Guidance

In standard CFG (Ho & Salimans, 2022), the guided prediction is:

v_guided = v_unconditional + cfg_coef * (v_conditional - v_unconditional)

This provides a single knob to control the trade-off between sample quality (higher CFG) and diversity (lower CFG).

Multi-Source Classifier-Free Guidance

JASCO extends this to multiple conditioning sources. The guided vector field is a weighted sum of multiple terms, each computed with a different subset of conditions:

v_guided = w_all * v_all + w_txt * v_txt + w_null * v_null

Where:

  • v_all is the vector field with all conditions active (text + temporal)
  • v_txt is the vector field with text only (temporal conditions dropped)
  • v_null is the vector field with no conditions (fully unconditional)
  • w_null = 1 - w_all - w_txt (weights sum to 1)

This decomposition allows independent control over:

  • How much the generation follows the temporal structure (chords, drums, melody) -- via cfg_coef_all
  • How much the generation follows the text description specifically -- via cfg_coef_txt

Key Concepts

Parameter Default Role
cfg_coef_all 5.0 Weight for the fully-conditioned term (all conditions: text + chords + drums + melody)
cfg_coef_txt 0.0 Weight for the text-only conditioned term (temporal conditions dropped)
Null weight 1 - cfg_coef_all - cfg_coef_txt Implicit weight for the unconditional term, computed automatically

When cfg_coef_txt = 0.0 (the default), the guidance simplifies to a two-term scheme: fully conditioned vs. unconditional, similar to standard CFG but applied to the flow matching vector field.

CFG Term Classes

JASCO implements three distinct CFG term types:

Term Class Conditions Retained Conditions Dropped Purpose
AllCFGTerm All (text + symbolic + wav) None Fully conditioned generation
TextCFGTerm Text only Symbolic (chords, melody) and wav (drums) Text-only guidance
NullCFGTerm None All Unconditional baseline

Terms with negligible weight (absolute value below 1e-6) are removed to save computation.

Design Rationale

  • Compositional control: The multi-source decomposition lets users adjust temporal adherence and semantic adherence independently, rather than trading one for the other.
  • Backward compatibility: With cfg_coef_txt=0.0, the system reduces to standard two-term CFG, making it a strict generalization.
  • Extensibility via kwargs: Additional generation parameters can be passed through **kwargs and are forwarded to the underlying FlowMatchingModel.generate() call, supporting future extensions.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment