Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Facebookresearch Audiocraft Hydra YAML Composition

From Leeroopedia

Overview

This is a Pattern Doc describing the Hydra/OmegaConf YAML composition system used to configure MusicGen training. Rather than documenting a single API, this page describes the configuration files, their composition order, key parameter groups, and how they are consumed by the training entry point.

Source Locations

File Purpose
audiocraft/train.py line 130 Entry point with @hydra_main decorator
config/config.yaml Root configuration with global defaults
config/solver/musicgen/default.yaml MusicGen solver defaults
config/solver/musicgen/musicgen_base_32khz.yaml Base MusicGen 32 kHz experiment config
config/model/lm/musicgen_lm.yaml Transformer LM architecture config
config/conditioner/text2music.yaml Text-to-music conditioning config

Entry Point

@hydra_main(config_path='../config', config_name='config', version_base='1.1')
def main(cfg):
    init_seed_and_system(cfg)
    flashy.setup_logging(level=str(cfg.logging.level).upper(), log_name=log_name)
    flashy.distrib.init()
    solver = get_solver(cfg)
    return solver.run()

Key Configuration Groups

solver (musicgen)

Defined in config/solver/musicgen/default.yaml:

solver: musicgen
sample_rate: ???    # must be provided by specific config
channels: ???       # must be provided by specific config
compression_model_checkpoint: ???

dataset:
  batch_size: 128
  num_workers: 10
  segment_duration: 30
  min_segment_ratio: 0.8

optim:
  epochs: 200
  updates_per_epoch: 2000
  lr: 1e-4
  optimizer: adamw
  max_norm: 1.0
  adam:
    betas: [0.9, 0.95]
    weight_decay: 0.1

checkpoint:
  save_last: true
  save_every: 50
  keep_last: 10

Specific experiment override (musicgen_base_32khz)

Defined in config/solver/musicgen/musicgen_base_32khz.yaml:

defaults:
  - musicgen/default
  - /model: lm/musicgen_lm
  - override /dset: audio/default
  - _self_

autocast: true
autocast_dtype: float16
compression_model_checkpoint: //pretrained/facebook/encodec_32khz
channels: 1
sample_rate: 32000

dataset:
  batch_size: 192  # 32 GPUs

optim:
  epochs: 500
  optimizer: dadam
  lr: 1
  ema:
    use: true
    updates: 10
    device: cuda

schedule:
  lr_scheduler: cosine
  cosine:
    warmup: 4000
    lr_min_ratio: 0.0
    cycle_length: 1.0

model/lm (musicgen_lm)

Defined in config/model/lm/musicgen_lm.yaml:

lm_model: transformer_lm

codebooks_pattern:
  modeling: delay
  delay:
    delays: [0, 1, 2, 3]

transformer_lm:
  n_q: 4
  card: 2048
  memory_efficient: true
  norm_first: true
  weight_init: gaussian

Key Parameters Reference

Parameter Path Description Typical Value
optim.optimizer Optimizer type adamw, dadam, adam
optim.lr Learning rate 1e-4 (adamw), 1 (dadam)
optim.epochs Total training epochs 200-500
optim.updates_per_epoch Updates per epoch 2000
optim.max_norm Gradient clipping max norm 1.0
schedule.lr_scheduler LR schedule type cosine, null
schedule.cosine.warmup Warmup steps for cosine schedule 4000
generate.lm.use_sampling Use sampling during generation true/false
generate.lm.top_k Top-k sampling parameter 250
generate.lm.top_p Top-p (nucleus) sampling 0.0
transformer_lm.n_q Number of codebooks 4
transformer_lm.card Codebook cardinality 2048
dataset.batch_size Total batch size (across all GPUs) 192
dataset.segment_duration Audio segment length in seconds 30
sample_rate Audio sample rate 32000
compression_model_checkpoint Path to pretrained tokenizer //pretrained/facebook/encodec_32khz

Launching Experiments

Standard launch via Dora

dora run solver=musicgen/musicgen_base_32khz model/lm/model_scale=small conditioner=text2music

With command-line overrides

dora run solver=musicgen/musicgen_base_32khz optim.lr=5e-5 dataset.batch_size=64

Execute only a specific stage

dora run solver=musicgen/musicgen_base_32khz execute_only=generate continue_from=//sig/abc123

Composition Order

The final DictConfig is built by merging in this order (later overrides earlier):

  1. config/config.yaml (root defaults)
  2. config/solver/default.yaml (base solver)
  3. config/solver/musicgen/default.yaml (MusicGen solver)
  4. config/solver/musicgen/musicgen_base_32khz.yaml (specific experiment)
  5. config/model/lm/musicgen_lm.yaml (model architecture)
  6. config/conditioner/text2music.yaml (conditioning)
  7. Command-line overrides

Dependencies

  • hydra (via dora.hydra_main) -- configuration framework
  • omegaconf -- structured config with interpolation
  • dora -- experiment management, signatures, grid search
  • flashy -- distributed training utilities, logging

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment