Principle:Facebookresearch Audiocraft Compression Model Export
| Knowledge Sources | |
|---|---|
| Domains | |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Exporting a trained audio compression model from its full training checkpoint to a lightweight, inference-ready format. The export process strips optimizer states, training metadata, and other solver artifacts, retaining only the model's best weights, the configuration, and version metadata. This produces a compact file suitable for distribution and loading via the pretrained model API.
Description
During training, the CompressionSolver periodically saves full checkpoints that include the model state, optimizer state, EMA state, scheduler state, training configuration, and various bookkeeping data. These checkpoints can be hundreds of megabytes or more and are tied to the internal Dora experiment management system.
The export step transforms this training checkpoint into a minimal package containing:
- best_state -- the model weights from the best (or last, since
best_metric_nameisNonefor compression) training state - xp.cfg -- the full Hydra/OmegaConf configuration serialized as YAML, enabling the model architecture to be reconstructed at load time
- version -- the Audiocraft library version used for training
- exported -- a boolean flag set to
True, distinguishing exported checkpoints from training checkpoints
This exported checkpoint can then be loaded via models.CompressionModel.get_pretrained() or used as the audio tokenizer for downstream MusicGen/AudioGen training.
Usage
The export is performed after training completes as a post-processing step. It is specific to the compression workflow -- language model exports use a separate export_lm() function that handles FSDP state differently.
The typical workflow is:
- Train an EnCodec model using
CompressionSolver - Export the checkpoint using
export_encodec() - Load the exported model for inference or as a tokenizer for MusicGen
Theoretical Basis
Checkpoint Slimming
Training checkpoints contain extensive state required for resuming training (optimizer momentum buffers, learning rate scheduler state, RNG seeds, etc.) that is unnecessary and wasteful for inference. The export principle follows the common practice of checkpoint slimming: extracting only the subset of state needed for forward-pass inference.
Training Checkpoint (full):
best_state:
model: {...} <-- model weights
optimizer: {...} <-- removed during export
ema_state: {...} <-- removed during export
scheduler: {...} <-- removed during export
xp.cfg: DictConfig <-- serialized to YAML string
epoch: int <-- removed during export
...
Exported Checkpoint (slim):
best_state: {...} <-- just the model weights (flattened)
xp.cfg: str <-- YAML string of configuration
version: str <-- library version
exported: True <-- export flag
Key design decisions:
- Configuration preservation -- the full Hydra config is serialized alongside the weights so that the model architecture can be reconstructed without any external config files. This makes the exported checkpoint fully self-contained.
- Version tracking -- including the library version enables compatibility checking and debugging when loading models across different Audiocraft releases.
- Export flag -- the
exported: Trueflag allows the loading code to distinguish between training checkpoints and exported checkpoints, as they have different internal structures (in particular,best_stateis nested undermodelin training checkpoints but flattened in exported ones).