Implementation:Facebookresearch Audiocraft Hydra YAML Composition
Overview
This is a Pattern Doc describing the Hydra/OmegaConf YAML composition system used to configure MusicGen training. Rather than documenting a single API, this page describes the configuration files, their composition order, key parameter groups, and how they are consumed by the training entry point.
Source Locations
| File | Purpose |
|---|---|
audiocraft/train.py line 130 |
Entry point with @hydra_main decorator
|
config/config.yaml |
Root configuration with global defaults |
config/solver/musicgen/default.yaml |
MusicGen solver defaults |
config/solver/musicgen/musicgen_base_32khz.yaml |
Base MusicGen 32 kHz experiment config |
config/model/lm/musicgen_lm.yaml |
Transformer LM architecture config |
config/conditioner/text2music.yaml |
Text-to-music conditioning config |
Entry Point
@hydra_main(config_path='../config', config_name='config', version_base='1.1')
def main(cfg):
init_seed_and_system(cfg)
flashy.setup_logging(level=str(cfg.logging.level).upper(), log_name=log_name)
flashy.distrib.init()
solver = get_solver(cfg)
return solver.run()
Key Configuration Groups
solver (musicgen)
Defined in config/solver/musicgen/default.yaml:
solver: musicgen
sample_rate: ??? # must be provided by specific config
channels: ??? # must be provided by specific config
compression_model_checkpoint: ???
dataset:
batch_size: 128
num_workers: 10
segment_duration: 30
min_segment_ratio: 0.8
optim:
epochs: 200
updates_per_epoch: 2000
lr: 1e-4
optimizer: adamw
max_norm: 1.0
adam:
betas: [0.9, 0.95]
weight_decay: 0.1
checkpoint:
save_last: true
save_every: 50
keep_last: 10
Specific experiment override (musicgen_base_32khz)
Defined in config/solver/musicgen/musicgen_base_32khz.yaml:
defaults:
- musicgen/default
- /model: lm/musicgen_lm
- override /dset: audio/default
- _self_
autocast: true
autocast_dtype: float16
compression_model_checkpoint: //pretrained/facebook/encodec_32khz
channels: 1
sample_rate: 32000
dataset:
batch_size: 192 # 32 GPUs
optim:
epochs: 500
optimizer: dadam
lr: 1
ema:
use: true
updates: 10
device: cuda
schedule:
lr_scheduler: cosine
cosine:
warmup: 4000
lr_min_ratio: 0.0
cycle_length: 1.0
model/lm (musicgen_lm)
Defined in config/model/lm/musicgen_lm.yaml:
lm_model: transformer_lm
codebooks_pattern:
modeling: delay
delay:
delays: [0, 1, 2, 3]
transformer_lm:
n_q: 4
card: 2048
memory_efficient: true
norm_first: true
weight_init: gaussian
Key Parameters Reference
| Parameter Path | Description | Typical Value |
|---|---|---|
optim.optimizer |
Optimizer type | adamw, dadam, adam
|
optim.lr |
Learning rate | 1e-4 (adamw), 1 (dadam)
|
optim.epochs |
Total training epochs | 200-500
|
optim.updates_per_epoch |
Updates per epoch | 2000
|
optim.max_norm |
Gradient clipping max norm | 1.0
|
schedule.lr_scheduler |
LR schedule type | cosine, null
|
schedule.cosine.warmup |
Warmup steps for cosine schedule | 4000
|
generate.lm.use_sampling |
Use sampling during generation | true/false
|
generate.lm.top_k |
Top-k sampling parameter | 250
|
generate.lm.top_p |
Top-p (nucleus) sampling | 0.0
|
transformer_lm.n_q |
Number of codebooks | 4
|
transformer_lm.card |
Codebook cardinality | 2048
|
dataset.batch_size |
Total batch size (across all GPUs) | 192
|
dataset.segment_duration |
Audio segment length in seconds | 30
|
sample_rate |
Audio sample rate | 32000
|
compression_model_checkpoint |
Path to pretrained tokenizer | //pretrained/facebook/encodec_32khz
|
Launching Experiments
Standard launch via Dora
dora run solver=musicgen/musicgen_base_32khz model/lm/model_scale=small conditioner=text2music
With command-line overrides
dora run solver=musicgen/musicgen_base_32khz optim.lr=5e-5 dataset.batch_size=64
Execute only a specific stage
dora run solver=musicgen/musicgen_base_32khz execute_only=generate continue_from=//sig/abc123
Composition Order
The final DictConfig is built by merging in this order (later overrides earlier):
config/config.yaml(root defaults)config/solver/default.yaml(base solver)config/solver/musicgen/default.yaml(MusicGen solver)config/solver/musicgen/musicgen_base_32khz.yaml(specific experiment)config/model/lm/musicgen_lm.yaml(model architecture)config/conditioner/text2music.yaml(conditioning)- Command-line overrides
Dependencies
hydra(viadora.hydra_main) -- configuration frameworkomegaconf-- structured config with interpolationdora-- experiment management, signatures, grid searchflashy-- distributed training utilities, logging