Principle:Facebookresearch Audiocraft Hydra Training Configuration
Overview
Hydra Training Configuration is the composable, hierarchical configuration management methodology used throughout AudioCraft for defining and launching MusicGen training experiments. Rather than relying on a single monolithic configuration file or command-line argument parsing, AudioCraft employs the Hydra framework (with OmegaConf for structured configs) combined with Meta's Dora experiment management library. This pattern allows researchers to define experiments as compositions of independently maintainable YAML fragments, override individual parameters from the command line, and automatically track unique experiment signatures for reproducibility.
Theoretical Foundations
Composable Configuration
Modern ML experiments require hundreds of parameters spanning model architecture, optimization, data loading, evaluation, and infrastructure. Managing these as flat key-value pairs quickly becomes intractable. Composable hierarchical configuration addresses this by:
- Config groups -- Parameters are organized into logical groups (solver, model, conditioner, dataset, optimizer, schedule). Each group has its own YAML file(s) in a structured directory.
- Defaults lists -- Each config file declares which other configs it inherits from via a
defaultslist. This creates a composition chain where specific configs override general ones. - Interpolation -- OmegaConf supports variable interpolation (e.g.,
${sample_rate}) so parameters can reference other parameters without duplication. - Command-line overrides -- Any parameter can be overridden at launch time without modifying config files, enabling rapid experimentation.
Hydra Framework
Hydra is Facebook's open-source framework for managing complex configurations. Key features used by AudioCraft:
- Config search path -- Hydra searches for YAML files in the
config/directory tree. - @package directive -- The
# @package __global__directive allows solver-level configs to inject parameters at the global scope rather than under a nested key. - Config composition -- Multiple YAML files are merged in order, with later files overriding earlier ones. The
_self_entry indefaultscontrols where the current file's values sit in the merge order.
Dora Experiment Management
Dora extends Hydra with:
- Unique signatures -- Each unique configuration (after excluding certain keys) gets a deterministic hash signature. This ensures that re-running the same config resumes from the existing checkpoint.
- Grid searches -- Dora supports defining grids of experiments as Python files that enumerate config overrides.
- History replay -- When resuming, Dora replays logged metrics so TensorBoard/wandb visualizations remain continuous.
- Git save -- For grid runs, Dora saves the exact git commit, ensuring full reproducibility.
Key Principles
- Separation of concerns -- Solver config (training loop behavior), model config (architecture), conditioner config (conditioning strategy), and dataset config (data loading) are in separate files and can be independently swapped.
- Progressive specialization -- The config chain goes from general defaults to specific experiments:
config.yaml->solver/default->solver/musicgen/default->solver/musicgen/musicgen_base_32khz. Each level only specifies what differs from the parent. - Signature stability -- Certain parameters (device, logging, num_workers) are excluded from the Dora signature so that infrastructure changes do not create new experiment folders.
- Single entry point -- All training is launched through
audiocraft/train.pyviadora run, ensuring consistent initialization of seeds, distributed training, and logging.
Configuration Hierarchy for MusicGen
The configuration inheritance chain for a standard MusicGen training run:
config/config.yaml-- Global defaults (device, seed, logging, SLURM, Dora)config/solver/default.yaml-- Base solver defaults (FSDP, profiler, EMA)config/solver/musicgen/default.yaml-- MusicGen solver defaults (dataset, metrics, generate, evaluate, checkpoint, optim)config/solver/musicgen/musicgen_base_32khz.yaml-- Specific experiment config (compression checkpoint, sample rate, batch size, optimizer, schedule)config/model/lm/musicgen_lm.yaml-- Transformer LM architecture (codebooks pattern, n_q, card, attention config)config/conditioner/text2music.yaml-- Conditioning configuration (T5 text encoder)
Role in the MusicGen Training Pipeline
Configuration is resolved before any training code executes:
- Dora/Hydra composes all YAML files and command-line overrides into a single
DictConfig. train.py:main()receives the fully composed config.- The config is used to instantiate the solver, model, dataloaders, optimizer, and all other components.
- The config is stored alongside checkpoints for full reproducibility.