Principle:Facebookresearch Audiocraft Language Model Export

Overview

Language Model Export is the process of converting a full training checkpoint -- containing optimizer states, EMA weights, training metadata, and other training-only artifacts -- into a lightweight deployment-ready format containing only the parameters needed for inference. This transformation is essential for distributing trained MusicGen or AudioGen language models, as training checkpoints can be many times larger than the actual model weights.

Theoretical Background

During training, AudioCraft's solver framework (built on Dora and Flashy) saves comprehensive checkpoints that include:

Model state dict (best_state) -- the best-performing model weights, nested under a model key
FSDP best state (fsdp_best_state) -- an alternative best state when using Fully Sharded Data Parallel training, also nested under a model key
Optimizer state -- full optimizer parameter groups and momentum buffers
EMA state -- exponential moving average of model parameters
Training configuration (xp.cfg) -- the Hydra/OmegaConf configuration used for the experiment
Epoch and metrics history -- training progress tracking information

For deployment, only the model weights and the configuration (to reconstruct the model architecture) are needed. The export process strips away all training-specific data and serializes the result in a standardized format that includes an AudioCraft version tag for compatibility tracking.

Key Concepts

Concept	Description
best_state	The model state dictionary corresponding to the best validation performance during training, stored under `pkg['best_state']['model']`
fsdp_best_state	An alternative best state key used when the model was trained with FSDP; takes priority over `best_state` when present and non-empty
xp.cfg	The experiment configuration serialized as a YAML string via `OmegaConf.to_yaml()`, preserving all architecture and dataset parameters
exported flag	A boolean `exported: True` marker that downstream loaders use to distinguish exported checkpoints from training checkpoints
version tag	The AudioCraft library version string embedded in the export for compatibility verification

FSDP Handling

When models are trained using PyTorch's Fully Sharded Data Parallel (FSDP), the best model state is stored under a different key (fsdp_best_state) because FSDP shards model parameters across ranks. The export function first checks for this key, and if it contains data, uses it as the source of truth for the model weights. This ensures that models trained with either standard DataParallel or FSDP can be exported through the same pipeline.

Design Rationale

Size reduction: Training checkpoints can be 3-4x larger than the model weights alone due to optimizer state (e.g., Adam has 2 state tensors per parameter).
Portability: Exported models are self-contained with their configuration, enabling reconstruction on any machine without access to the original training setup.
Version tracking: The embedded version string helps diagnose compatibility issues when loading models across AudioCraft versions.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment